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ABSTRACT 

The na^ture of educational evaluation, and its 
interaction with policy in siit state departments of education is 
examined. Case reports of research and evaluation units are presented 
for Virginia (by Gerald W. Bracey) , Michigan (by David L. Donovan and 
Stanley A. Rumbaugh) , wlishington (by Alfred F. Rasp, Jr .), South 
Carolina (by Paul D. Sandifer), Wisconsin (by James H. Gold), and 
Oreqon (by Gordon Ascher). Analyses and commentaries on the common 
themes of the reports include "Policy and Evaluation: A Conceptual 
Studv " by Thomas F. Green and "The Context of Evaluation Practice in 
State Departments of Education," by Kick L. Smith. Future prospects 
are considered in "The Need for New Approaches in State Level 
'Evaluations," by Alexander I. Law, arguing for agency innovation, and 
".Problems in the Implementation and Acceptance of New Evaluation 
Approaches in State Departments of Education," by Norman Stenzel, 
summarizing barriers to the improvement of practices. (CM; 
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PREFACE 



In spite of widespread interest in improving evaluation and policy 
analysis in the social* sciences, there has been remarkably little study of 
the daily practice of these activities at state and federal levels. Further, 
much 9| the literature on evaluation and policy analysis has been written 
by imiversity-based commentators who have an external perspective on 
governmental processes. The views by firing-line practitioners are often 
strikingly different from the descriptions provided by external observers. 
We believe it is crucial to understand the insiders' views of the dfeiily 
practice of evaluation and p<ylicy analysis if they. are to be improved^. 

In this volume we present discus'sions of the nature of evaluation and 
policy analysis in state departments of education. Eight of the ten 
chapters have been written by administrators and professional staff who 
worked in the research and evaluation units of state departments in 
California, Illinois, Michigan, Oregon, South Carolina^ Virginia, Washington, 
and Wisconsin. Drawing from their * own personal experiences, these 
authors describe the. natuife of evaluation and policy-making activities 
within state departments of education, focusing on the interaction between 
the two activities^ They address such questions as: ^ 

# " How does policy influence evaluation ? For example: How 
does policy determine whats is evaluated? Does policy 
somehow influence evaluation methods? How does policy 
influence the nature and organization of the evaluation 
unit? To what extent do policymakers call on evaluatprs to 
"help them with their policy problems? What evaluation, 
activities" require pblicy information? What percentage of 
evaluators* time is devoted to policy questions? 
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• How does evaluation influence policy ? For example: To 
what extent is evaluation used in policymaking? In what, 
ways do evaluation results influence policy deliberation? In 
what ways do * evaluators initiate contact with 
" policymakers? What policy problems require the use of ° 
evaluation information? How is evaluative information 
communicated to policymakers? Whatpercentage of 
policymakers' time, is ,devoted to addressing evaluatipn 
questions? I 

From these case reporits there emerges a ^^^erer picture of the 
programmatic, political, economic, and technological«nvironment in which 
evaluation and policy analysis occur. The accounts^u^Juded here portray 
evaluation and policy analysis as they are routinely per^rmed rather than 
as external commentators think they ought to be conduced. 

We hope that these insiders* views of state-level activities wUl 
increase understanding of actufil practice and serve empirically to ground 
further attempts to improve the theory, methods, and practice of 
evaluation and policy analysis at the state level. This volume is timely 
because of the recent attempts by the new federal administration to 
re-position some of the resources and respt>nsibility for evaluation, 
analysis, formation and implementation of policy at the state level, TThis 
volume is important because it (a) provides information on the critical 
interaction of evaluation and policy analysis at the state level, and 
(b) focuses on insiders' views of state level policy and evaluation rather 
than on external assessments! The volume is also noteworthy for its use of 
a novel self-portrayal in which practitioners prepare case reports which 
substantive specialists then analyze and comment upon. 

Part I of this volume contains six reports by individuals who managed 
or worked within r^earch and evaluation units in state departments of 
education in Virginia, Michigan, Washington, South Carolina, Wisconsin, and 
Oregon. Each author describes the nature and operation of his evaluation 
unit, empliasizing how policy influences evaluation and how evaluation 
influences policy within his setting. These reports provide inside views of 
the daily operation of policy and evaluation in state agencies as well as 
views of interagency and agency/public interactions. 

In Part II, two substantive specialists provide integrative^ analyses and 
commentaries of the preceding six ' chapters, Thomas F, Green, a 
philosopher from Syracuse University who conducts federal-level policy 
analyses discusses the view of policy implicit in these state level reports, 
Nick L," Smith, an evaluation researcher from the Northwest Regional 
Educational Laboratory, integrates the six reports to describe the 
contextual forces which influence state level evaluation operations, -These 
two chapters provide continuity and integrate the preceding reports, 
drawing'the reader's attention to common eremaits and themes. 

Having presented six intensive case descriptions of the interaction of 
policy and evaluation activities in Part I and then integrative analysis of 
these cases in Part II, the volume concludes in Part HI with a look at the 
future of evaluation and policy analysis in state departments of education, 
Alex Law of the California State Department of Education summarizes his 
view of the nature of state agency operations and argues the need for 
innovative approaches to state level practice. In the final chapter, Norman 
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Stenzel of the Illinois Office of Education reports on a study of barriers to 
innovation in state-level policy and evaluation activities, summarizing the 
major impediments to the improvement of agency practice. 

The material presented in this volume was developed as part of a 
programmatic* effort to improve the conduct of evaluation through the 
study of practice and the developm^t of new^ evaluation methods. The 
work was performed under the auspices of the Research on Evaluation 
Program of the Northwest Regional Educational Laboratory in Portland, 
Oregon. Supported by a multi-year contract from the National Institute of 
Education, the Research on Evaluation Program has been developing new 
evaluation methods for use principally in state education departrfients:* and 
local school districts. In order to improve evaluation methods, however, 
we need to understand the nature of the practice within which such 
• methods are used. The papers included in this vplume were therefore 
commissioned in order to learn more aboiat evalua^tion and policy in state 
departments of education. '.^ . 

This volume brings together insiders' views of the interaction of policy 
and evaluation at the state level, integrative analyses, and prospects for 
the future. We think it will be of use to evaluation practitioners, policy 
analysts,' ^nd administrators who interact with state-level operations as 
well ss to students in these areas who are likely especially to v^lue its 
portrayal and analysis of standard agency practice. ' 

We happily acknowledge the support given the work in this volume. 
The program staff of the National Institute of Education which funded this 
work, especially Daniel Antonoplos and Charles Stalford, provided 
important encouragement and direction to our efforts. We also appreciate 
for the thoughtful guidance and suggestions provided by the Research on 
Evaluation Program^s National Advisory PAnel: 

Adrianne Bank, University of California, Los Angeles 
Joan Bollenbacher, Cincinnati Public Schools (Retired) 
Egon Cuba, Indiana University 

Vincent Madden, 'California State Department of feducation 
Jason Millman, Cornell University , 
Stacy Rockwood, New Orleans, Louisianna 
Blaine Worthen, Utah State University 

We gratefully acknowledge th^ excellent manuscript preparation skills 
of Vicky Kerr, who saw the volume through earlier revisions on our word 
processing system, and especially Judy Turnidge. who, as production 
manager for the book, supervised all phases of the book's development and 
production. Her professional assistance was of immeasurable help. Our 
special' thanks go to our wives, Deni¥y and Karen, for their encouragem'ent 
and support. 

To these individuals and the many others who assisted in the successful 
completion of this work', we give our sincere thanks. 

Nick L. Smith 
Darrel N. Canlley 
Portland; Oregon 



PARTI. 

The Jnteractiorp of Evaluation and- Policy: 
Case Reports 

o y 

The first six chapters of th'is volume were written by professional staff 
members of research and evaluation units within state departments of 
education in Virginia, Michigan, Washington, South Carolina, Wisconsin, and 
Oregon. Each author provijles a short description of the administrative 
-structure of his state department of education, including how the research 
and evaluation unit fits into this structure. 

The major parts of the reports deal wi\h how jgvaluation affects policy 
and how policy affects evaluation. Such questions as the -fpllowing are 
addressed: Hdw does policy determine what is evaluated? Does policy 
somehow influence evaluation methods? How does policy affect the nature 
and organization of an evaluation unit? To what extent is evaluation used 
in policy making? How is evaluative information communicated to policy 
makers? 

The six chapters confirm that. in many instances policy determines the 
course of evaluation and that evaluation affects policy. But there is not 
mutual impact— policy has^ a greater effect on evaluation than evaluation 
has on pofli^y, Gerald Bracey (Virginia) mentions this in his paper: "Where 
(policy and evaluation) have been related, the relationship 'has generally 
been unidirectional, with policy having a significant bearing on evaluation, 
but not vic^ versa," The authors have used much, more space in 
descriptiore of the effect of policy on evaluation compared with their 
descriptions of the effect of evaluation on policy. From reading the 
chapters which follow, it is clear that the authors could readily give 
examples of policy affecting evaluation but not so readily give examples'of 
evaluation affecting policy, , 

The six reports show that, as one might expect, evaluation is a 
politically inspired endeavor. Whenever an evaluation 'affects the future 
allocation of resources, hence a change in power relationships,*, it is a 
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political activity. The individiiah authors realize that the programs they 
eveduate are not only spawned by political forces but also have political 
consequences, 

All authors have the problem of knowing. what "policy" is, Gerald 
Brac^ey (Virginia) reviews the confusion surrounding the term "policy," 
searches for a definition of the term*, but gives up the attempt. Green, 
whosQ paper appears in Part II, states that "there is pcob^bly no single 
definition of ^policy' adequate to capture the full range of ordinary usage," 
Thfe reader should - be aware that the use of the term "policy" is not 
consistent throughout the book, . . 

All six chapters reflect the view of the 'state legislature as being an 
important 'policy- making body. The authors see the law as reflecting policy 
that affects the evaluation efforts of State Education Agency (SEA) 
evaluation units* The other strong policy making body that has affected" 
evaluation is the Federal government^ mc«st authors mention Title I 
legislation as significantly affecting evaluation in their state, 

In describing their evaluation units and the functions of the units, the 
authors use a historical perspective, describing how, legislation historically 
has influenced the nature of evaluation units and their work. What the 
research and evaluation units are today is largely a result of past 
legislation. Through the use of historical examples, these reports show 
that policy affects evaluation in two major ways. Policy determines not 
only what is to be evaluated, but it also shapjss arid dictates what 
evaluation methodology will be used. 



CHAPTER 1 . 

The Virginia Experience ^ 

Gerald W. Bracey 

Judging from the literature I have read, evaluation and policy appear 
to have been only tangentially or occasionally related at either state, 
federal, or even local 'levels: Where they have been related, the 
relationship has generally been unidirectional with policy having a 
significant bearing on evaluation, but not 'vice versa. Explanations for this 
phenomenon vary. Fox (1977), reviewing 10 years of Teacher Corps 
evaluations, concluded that the problem lay primarily with the methods 
chosen. According td Fox the "standard experimental design" as defined by 
Astin and Panos (1971) was often used and was invariably inappropriate. 
Fox recommended to evaluators the "model" by Parlett and Hamilton 
(1976) described as "evaluation as illumination." Certainly evaluators have 
been misled by certain metholB>^However, in this case, I feel that a 
careful reading of Fox's paper leads one to conclude that the policy 
changes affecting the Teacher Corps would have taken place no matter 
what evaluation model had been chosen. Policy changes were determined 
by other factors and the outcomes of evaluations were simply irrelevant to 
such decisions. Indeed, House (1979), has argued that the FoUow Through 
Program in partic^dar and federal programs in general have been evaluated 
in such a way that the evaluations are bound to be narrow,^ trivial, and 
hence irrelevant. Many would argue, I among them, that the current Title I 
evaluations wUl be another case of a disfunction and disjaaction between 
evaluation findings and policy changes. , • u- 

All of this prologue is simply to document that the relationship 
between evaluation and policy has not been a happy one, at least as viewed 
from the perspective of the evaluators. Alkin and DaUlak (1979) recently 
lamented: 
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There have been great hopes for evaluation, not only among 
evaluators themselves, but also among* many other educators,^ 
elected officials and the public. Yet these hopes have dimmed. . 
It was hoped that evaluation information would help planners, ^ ' 
administrators and policy makers both by improving individual 
programs, and by aiding in choices among programs. The reality 

" we are told by almo$t all observers, is that evaluation has had 
little influence on educational decision making, and evaluation 

.information is largely ignored (p. 41). « • 

Although most evaluators (and even f)olicy makers) ^reading that 
statement would probably give general assent, the statement by itself has 
several problems. It does not delineate types of evaluation or define 
policy. A necessary first step in ' understanding th*e role between 
"evaluation" and "policy" in the Virginia Department of Education 'or 
anywhere else is to define both evaluation and policy. 

At one level any decision that determines* policy is based in pajrt on 
information that could be called evaluative. However, such information 
*could be (and often is) derived largely from personal experience.* Piphp 
(1978), reported that 80% of all legislation introduced with respect to, 
Minimum Competency Testing Programs Vesulted from the immediate 
experience of a legislator ^ith either his own. faniily or that of neighbors or 
relatives. This datum alone should give some indication of the .relative 
force of large evaluation projects on policy decisions. While these 
legislators are clearly evaluating information by using personal experiences 
as a source of evaluative information, such experiences do not constitute 
an adequate definition for evaluation for the purposes of this paper.2 
^ For the purposes of this paper, "evaluation" or "evaluative information" 
will refer to any of the ten categories of studies classified by Webster and 
Stufflebe^m (1978) as quasi-^valuation or true evaluation.^ While it 
would be fasjcinating to include a discussion of what those authprs refer to 
as pseudo-evaluation— propagandisti'c studies designed to support 
predetermined policies — such discussion would be difficult as^ by definition, 
the goals are predetermined and part of the relevant information i^ either 
not collected or deliberately concealed''by the perpetrators of the study. 
Note that one of the ten categories itself is labeled "policy studies" which 
foreshadows what will become clear— that the relationship between the two 
concepts is sometimes intricate and simultaneously elusive. 

. . If one can obtain a, frame of reference for "evaluation"* by noting a 
single synoptic paper, Such is not the case with "policy." * Attempts by the 
author to obtain concrete,, short cfefiniticns of policy have failed. The 
general" thrust^of the answers has beenjl can't define policy bpt I will know 
one when^^I see one?" ^'Dictionaries are no help and articles about policy 
provide liUle guidance* because the word is used in so many contexts. It is 
endowed, as philosopher^ of science put it, with a great deal of "surplus 
meaning." Generally, people responding to the question, "What is Policy?" 
discuss an overall philosophy, ideology,^ or plan of action,, sometimes with 
clear budgetary implications, sometimes without. 

After a number of these discussions, I imagined a continuum running 
from broad-scoped statements of "polic^" at one end to specific statements 
of administrative .procedures at the other. I was, however^ unable to 
determine any point along that continuum where .policy clearly became 
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procedure or vice versa. Indeed, it will be seen during this chapter that 
one source of difficulty in relating policy and evaluation is a confusion 
among all the actors involved as to where policy ends and administration 
begins. A result of this confusion is that administrative procedures, best 
left as such, get elevated to the level of policy making, with a resultant 
territorial intrusion by one group on the other. SimilarJy, policies get 
enacted with no clear procedural implications. Almost any course of 
action can be restated as policy. "It is the policy of the United States to 
contain communism wherever it should appear." Few would argue that this 
Ms not a policy of the U. S. Government, but its procedural implications are 
by no means clear; the possible translative strategies range from providing 
effective demonstrations of the superiority of a capitalist system, to 
providing assistance to any government promising to fight against 
communist activity in that country, to physical obliteration of all nations 
professing themselves to have communist governments. 

In other cases to be discussed, it appears^that policy statements often 
make an appearance to justify administrative procedures already in p.ace. 
Unless, of course, those procedures themselves are under attack. 

Finally, in some cases the policy statement is redundant with or a 
gratuitous addition to a procedure. The 1978-80 Standards of Quality for 
Public Schools in Virginia, to be discussed in detail below, illustrate such a 
case of redundancy. One part- of one standard states, J'ltJs the policy of 
the Commonwealth that the awarding of a high school diploma shall be 
based upon achievement." There foUows the standard that puts into place 
Virginia's graduation competency testing program. The policy statement is 
unnecessary and is uiiattached in any causal' way to the testing program. In 
recognition of this fact, some legislators unsuccessfully attempted to have 
the policy sentence removed before the standard was enacted into law. 

It is comforting, if got, illuminating, to note that others have struggled* 
with definitions of policy issues and it is worthwhile to examine their 
struggles. 

Berlak (1970) pointed out that evaluators needed to know whether they 
vyere operating in an area of programmatic impact^ or policy impact^ 
and act accordingly. For Berlak, a policy issue has four critierial 
attributes: 

1. It directly or indirectly alters the power relationship 
between the citizens and the state. 

2. It affects immediately or in the long run the status person 
has and the power he can exercise within the social system. 

3. It increases or decreases political or social tensions as a 
primary outcome. 

4. It alters the self concept of the individual. 

While not a model of clarity, nor in several instances easily amenable 
to empirical tests, these considerations are an improvement over the 
classical political definition of policy as the "application of reason and 
evidence to choose among program alternatives." Perhaps the uesf 
discussion of policy is by Mann (1975) who chooses, like Berlak, not to 
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define policy but to describe policy issues in education again in terms of 
criterial attributes* For Mann, policy issues have five characteristics. 

1, Policy problems are public. This might at first seem 
unnecessary to state, as we are dealing with public 
education and hence all things are in public domain, but in 
fact they are not. Certain "policy" discussions revolve 
around what is the proper goal of public education. Which 
of the desired outcomes of childhood are the domain of the 
school and which are those of famUy and other institutions? 
Cet^tainly much heat, if not light has been generated over 
where the responsibility of schools ends, 

2, Policy problems have important consequences. They 
increase tensions among political groups and their resolution 
directly affects the lives of a large number of people or a 
small number of people in large ways, 

3, Policy problems are complex. They have political, 
economic, moral dimensions which, of course, do not 
operate independently but interactively, 

4, Policy problems involve large amounts of uncertainty. This 
almost follows from #3 above. If the state decides to 
allocate more funds to school districts with low scoring 
children in order to hire more teachers, what will be the 
outcome? Clearly this cannot be known in advance, 
although various "scenarios" can be depicted with greater or 
lesser sophistication, 

5, Policy problems are viewed differently by those with 
different interests or ideologies. Again this appears trivial 
or at least axiomatic but it is important to state. If all 
people agree on what is to be done, then it is no longer a 
policy problem. And, in terms of this chapter, the fact that 
people disagree has direct bearing on how, when, or if 
evaluative information will be used. 

Attribute five, even if axiomatic, is important to the thrust and tone 
of this chapter, I take it as axiomatic that readers will not be interested in 
the effect of evaluation on policy unless that evaluation contributes to the 
resolution of a policy issue, 

THE STRUCTURAL CONTEXT 

There are, fundamentally, three policy making agencies in Virginia in 
the area of education: 

1, the Department of Education, a part of the executive 
, branch of the government 
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2. the State Board of Education, appointed at the pleasure of 
the Governor and operating through the Department, but 
often quite independent of it 

3. the legislature 

Policy matters may originate with any of the three. Generally, any 
policy matter originated in the Department is brought before the Board for 
approval, and the Board, seeking to take a stance on a policy- matter, may 
instruct the Department to provide the information necessary to make 
decisions. 

I will show in some detail in the next section that most major policy 
issues either enter the arena via the constitutionally required "Standards of 
Quality for Public Schools," also to be discussed below, or such policy 
issues come to rest there. In general, the Standards of Quality (SOQ) are 
drafted each biennium by the Department staff, approved with revisions by 
the State Board and approved with further revisions by the legislature. The 
legislature enacts the SOQ into law. It is thus obvious that the legislature 
is the final arbiter of educational policies contained in the SOQ no matter 
where the policy originated (unless the entire act is vetoed by the 
Governor). Again, as I will discuss in both the Historical Context and the 
Case History sections of this chapter to follow, the legislature in recent 
years has been very active in resolving new policy issues and reacting to 
policy matters brought to them by the Board and the Department. 

The Governor may, of course, independently formulate policy through 
his office or through those of the Secretary of Education or the 
Superintendent of Public Education. In recent years, where governors have 
acted independently, the target of their actions has been largely the 
domain of post-secondary education, not elementary or secondary public 
schooling. The Superintendent of Public Instruction and the members of 
the Board are all appointed by the Governor, subject to approval by the 
legislature. According to the constitution, the Superintendent's term is 
coincident with that of the appointing Governor. In fact, no 
Superintendent has gone out of office with the appointing Governor. It 
might be noted the Virginia governors may hold two four-year terms, but 
not successively. The nine members of the State Board are appointed for a 
four-year term and may succeed themselves once. Terms are staggered 
such that no more than one vacancy occurs at a time. In the several years 
that I have been with the Department, six of the nine members have been 
succeeded, although one of the six. resigned due to the press of other duties 
as the mayor of a large city. 

The entire current Board has been appointed by Republican Governors 
although approved by Democratically controlled legislatures. It is 
sometimes aUeged that the differing views taken by the Board and the 
legislature on matters in recent years are a reflection of this differing' 
partisanship affiliation. It is also alleged that the legislature fights some 
of its battles with Governors through the vehicle of overriding the Board. 
Not only is the truth of this statement not verifiable, the allegation is not 
readily apparent in public exchanges. I was told by several people in all 
seriousness that the differences produced by partisan affiliation would be 
obliterated by the communality engendered by Virginia's longstanding 
commitment to conservative tradition. 
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Finally, t'ne media, especially the prinf media, have shown no 
reluctance to propose or ^ react to policy issues relating to public 
education. This might be expected by the description of policy issues in 
the preceding section. As pertains to evaluation, the press has been 
particularly active in commenting on the results of state mandated testing 
programs and trends in test results both in the state and nation. As will be 
seen, testing has come to constitute the bulk of evaluative information 
about public schooling at the state level. 

Figure 1 shows the structure of the Department of Education <fis of 
December 1979. Only those areas outside of the Director of Researcli, 
Evaluation and Testing which are also concerned lo significant ways with 
evaluation are labelled. Both the components of the Division of Research, 
Evaluation and Testing (DRET) and the way in which information flows 
from it to other parts of the agency merit comment. 
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Figure 1. Structure of Virginia Department of Education as of 
December 1979 showing top echelons and location of 
programs concerned with evaluation. 
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It is an administrative regulation of the Superintendent that any 
information intended £or the State Board must go through the 
Superintendent's office. While the previous Superintendent delegated much 
autonomy to the assistant superintendent for the flow of such information,, 
the incumbent personally "signs off" o-i all such information. The 
importance of the quality of information flow from lower to upper levels in 
the hierarchy is manifest, / 

Similarly, it is- a regulation that con'^acts between the Department, 
officially a part of the executive branch, and the legislature cannot be 
initiated by Department members. The Assistant Superintendent for 
Administrative Field Services serves as the official liaison between the 
Department and the legislature. Any information thought important to the 
legislature must - ^e funneled through him. Legislators may contact 
members of the Department informally for information or request them to 
testify before the various committees and subcommittees of either liouse, 
A brief report of all such contacts initiated by legislators in this way must 
be filed with the Assistant Superintendent for Administrative Field 
Services, 

Virginia's DRET contains several functions which might not be 
expected to be found in a so-named group and does not contain several 
other- functions which one might expect it to have. Notably it has m 
responsibility for the evaluations 'bf Title I, special education, or vocational 
edacation, "While this^^^^^^^^^^^ to "administrative awkwardness and some 
redundancy in efforts (children tested for Title I are retested under the 
state program, but the magnitude of any practice or regression effects is 
unknown) it has little real impact on policy at the state level. Most of the 
above mentioned programs are constrained ii^ terms of evaluation by 
federal, not state policies, Virginia has a long history of discordant 
relations with Washington, D,C,, and one* often gets the impression that 
where federal funds are involved, the Department prefers that offices 
handling these tainted monies to be as autonomous as possible, ^ 

^This general feeling does not hold true entirely, as witness the 
pla'^ement of the Title IV-C office with the DRET area. While arguments 
could be made for its being, elsewhere (and until 1977 it was with the 
Special Assistant for Federal Programs), coupled with the Pilot Studies^ 
program, which is a state funded innovative R<5cD effort, its placement 
within the Program Development sphere and R&D unit also makes sense. 

What is most noteworthy and important is that while there is a 
research staff designated as such and a testing staff designated as such, 
there is no evaluation staff designated as such. While accreditation falls 
into a valid category of evaluative information, the methods used in 
Virginia -do not at the moment conform to the typical methods cited by 
Webster and Stufflebeam for these operations, notably self-study and team 

visits, V 

Accreditation is based largely on\ self-report on a questionnaire, 
augmented by site visits. There is a secondary school evaluation 
designation, with one person in this role who organizes those teams which 
make site visits following a period of self-study by the local agencies. 
There are recommendations made from these comnjittees but there is no 
follow-up to determine if the recommendations have been acted on, and in 
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any case no sanctions- are imposed if problems are noted but 
recommendations are not followed, Virginia is one of few states which 
separates the accreditation and evaluation process for schools; • the 
evaluation process is voluntary— it must be requested by the local 
superintendent. 

The structure of the DRET leaves little or no opportunity for what is 
usually thought of a program evaluation. The research staff have skills 
appropriate to such evaluation, but for the most part their energies are 
devoted to either the periodic surveys that the department conducts or to 
assisting the testing service with the myriad of programs which have been 
mandated by the legislature in the past three years. 

The lack of evaluation staff constrains the DRET in two ways: It must 
either rely on the assistance of people from outside the division not well 
trained in evaluative techniques or more likely, it must turn to an outside 
agency such as a university, contracted for a particular task. While such 
contracts have advancages as well as disadvantages, the chief 
disadvantages are that university faculties are often not aware of the 
information needs of the Department, try unsuccessfully to fit the problem 
into the paradigm of academic research, and cannot be "on site" often 
1 enough to render assistance at its most timely occasion. 



. THE HISTORICAL CONTEXT 

As indicated in the previous section, the use of evaluation is affected 
significantly by the structural context of policy making. It would be a 
mistake, however, to view this context as static. Indeed the purpose of 
this section is to treat that structural context as a dynamic, fluid one and 
show how it has changed over the past decade. The primary means of 
focusing on the historical changes will be by following the evolution of the 
Standards of Quality for Public Schools in Virginia and the evolution of the 
position of Director of Research, Evaluation and Testing as reflected, in 
part, by changes in the job description for that position. 

While any starting date for a "history" would be arbitrary, the year 
1971 seems appropriate to demarcate a formal change in thinking about 
education. By 1971, the policy of massive resistance to school integration 
had fn itself been largely abandoned. No issue had dominated public 
education in Virginia quite the way that the ramifications from Brown vs. 
The Board of Education had ("This will keep us in power for at least 
twenty-five years," said one legislator in the 1950% clearly seeing how 
long the issue would be in the forefront), but in 1971 the debate' revolved 
not around whether, but how (i.e. through voluntary efforts or 
court-ordered busing). More importantly, in 1971, the Commonwealth of 
Virginia approved a new constitution for the state. A part of that 
constitution, reads as follows: 

Standards of quality for the several school divisions shall be 
determined and prescribed from time to time by the Board df 
Education, subject to revision only by the General Assembly, 
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The General Assembly shall determine the manner in which funds 
are to be provided for the cost of maintaining an educational 
program meedng the prescribed standards of quality and shall 
provide for the apportionment of the cost of such program 
between the Commonwealth and the local units of government 
comprising such school divisions. Each unit of local government 
shall provide its portion of such costs by locetl taxes or from 
other avaUable funds. ART/Vffl #2. ' 

The Standards of Quality (SOQ) have come in the last eight years to be 
the final resting place of all policy issues where some state agency is the 
initiator of policy. Almost all major policy matters have either begun with 
changes in the SOQ, or if begun elsewhere (as, say, a change in the 
standards for accrediting secondary schools), have eventually made their 
way into this document. Similarly, most of the debates over policy issues 
at the state level can be found reflected in the changes of the SOQ ever 
time and in the manner in which they were changed by the respective 
bodies who write and rewrite them. 

The original SOQ for 1972-74^ were performance-oriented with a 
number of standards being expressed^ in quantified terms relating to 
outcomes for both schools and individuals.. Most of these outcomes were 
countable and stated in terms of expected changes in the future. For 
example: 

At least 45,000 five-year-old children in the State will be 
enrolled in kindergarten (26,500 in 1969-70). 

Only one standard actually dealt with learner outcomes, and that 
standard, mear^ingless on its face, was to cause considerable controversy 
and lead to changes in the SOQ. This standard and the controversy it 
produced will be discussed as part of the illustrative case history presented 
below. 

In 1973, the position of Director of Program Evaluation (later to 
become Director of Research, Evaluation and Testing), was created and 
carried, in part, the following job description: 

1. to provide, leadership in evaluating Staite objectives 
- (purposes of education adopted by the State Board of 

Education), programs (including standards of quality), and 
student progress 

2. to develop a training program for the State Department of 
Education staff and for school division personnel in 
translating state objectives into learner-oriented objectives, 
many of which should be, measurable 

3. to develop, with the assistance of consultants and a 
representative- committee, the criteria heeded by school 
divisions to evaluate their own programs, organization and 
procedures, reporting and progress— especially the progress 
in student learning 
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4, to assist schools and school divisions in making-^realistic 
evaluations and reports to the public 

5. to encourage and- assist institutions of higher learning to 
evaluate their programs for pre-service and in-service 
preparation of teachers 

This broad, wide-ranging set of duties was never realiz^ed. The 
Program Evaluation staff consisted at first of only the Director and one 
other -professional, and later a second professional staff member was 
added. One reason for this lack of support was that the Standards of 
Quality themselves were being revised substantially for the 1974-76 version 
in a way so as to downplay the need for evaluation. In this version, the 
performance standards which had led off the list of original standards were 
eliminated. The general goeds of public education were stated and five 
modified performance standards were listed as "objectives" with the note 
that "school divisions may. wish to establish additional specific objectives 
to receive priority during the biennium." 

The bulk of the standards, listed as such, were ten prescriptive 
statements categorrzable as "input" standards. They specified that schools 
were to provide kindergarten, special education, gifted and talented 
education, a certain number of professional staff for each certain number 
of students, etc. 

This shift away from the outcomes of schooling back to the provision 
of goods and services— inputs— did not sit weU with the legislature. 
According to persons 1 interviewed who were around at this time, the 
legislature felt that the intention of the SOQ section of the 1971 
constitution had never been properly enacted by the State Board of 
Education. In any case, the 1974 session of , the legislature, noting that 
there had been "discussion both within and without the General Assembly 
as to what is approproate to be included in the Standards of Quality...," 
created a joint subcommittee to study the SOQ. Formed in September 
1974, the committee delivered its final report in December, 1975, just 
before the General Assembly convened for its 1976 session. 

The words of the report itself reveal as well as anything can, the scope 
and philosophy of the subcommittee's study: 

Rather than confining its work only to the language of the 
standards, the joint subcommittee has sought to review 
comprehensively aU aspects of publicly financed education in 
Virginia. 

The outcome of this review, again in the words of the committee, was: 

To a great extent, the Joint Subcommittee's revision [of the 
Standards of Quality] has been based on the. concept that the 
quality of education is measured ultimately by what students 
have learned (output) rather than the quantity or quality of 
resources devoted to education (input). Whereas some standards 
must be oriented towards input, the greater emphasis should be, 
in the opinion of the Joint Subcommittee, on output. 
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Thus, very clearly and dramatically did the General Assembly, on 
adopting the SOQ, as revised by the Joint Subcommittee, change the thrust 
^of educational policy away from the traditional goal of providing goods and 
services to measuring outcomes. In fact, many people outside of the 
General Assembly felt that the legislature had not revised the SOQ so^' 
much as they had actually rewritten them, thereby overstepping their 
constitutional powers,/ According to these same observers, the Board 
decided that any constitutional challenge could result, at best, in winning a 
battle but • certainly losing the war; accordingly, no such challenge was 
made. 

Two other quotes from the final report are worth noting because they 
set the stage for the introduction of two Standards which were indeed 
written, not revised, by the General Assembly, Continuing with its focus 
on outputs, the Joint Subcommittee concluded its introduction with a set of 
premises^ the first two of which are as follows: 

1, The basic purpose of the Standards of Quality is to establish 
minimum elementary and secondary educational goals that 
are to be met for each child (to the extent practicable) 
throughout the Commonwealth, 

2, Standards established by the General Assembly should be 
oriented primarily towards products (objectives, outputs and 
goals) rather than processes (inputs and means), thereby 
creating a structure and anvironment for quality educatibn. 

To the best of my knowledge this is the first use of the word minimum 
in connection with the outcomes of public instruction in Virginia^ and 
provides the basis for a wide ranging set of changes in the orientation of 
public education. With these premises in mind the General Assembly wrote 
the following standards: 

Standard lA, The General Assembly concludes that one of the 
fundamental goals of public education must be to enable each 
student to achieve, to the best of his or her ability, certain basic 
skills. Each school division shall, therefore, give the highest 
priority in its instructional program to developing the reading, 
communications and mathem.atics skills of all students, with 
concentrated effort in,,, grades one through„,six. Remedial work 
shall begin for low achieving students upon identification of their 
needs. 

Standard IB, By September 1978, the Board of Education, in 
cooperation with the local school divisions, shall establish 
specific statewide minimum educational obj^tives in reading, 
communications and mathematics skills that should be achieved 
during the prun^y grades and during the intermediate grades. 

And how was the GenerklsAssembly or the public to know if concentration 
was being focused on thes^x^asic learning skills and if individuals were to 
be receiving appropriate reme<!l^ work? By means of tests: 
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Standard 7A, By September 1978, each school division shall 
primarily utilize testing programs that will provide the individual 
teacher with information to help in assessing the educational 
needs of individual students. 

Standard 7B, Beginning in September 1978, each school division 
shall annually administer uniform statewide tests developed by 
the Department of Education to measure the extent to which 
each student in that division has progressed during the last year 
^in achieving the specific educational objectives that have been" 
established under Standard IB. 

Standard 1 codifies a back-to-basics movement for Virginia. 
Standard 7 provides perhaps the most ambitious, comprehensive program of 
diagnostic testing in history. If, that is, the standards are to be taken at 
face value. And while certainly these standards, conceived entirely within 
the General Assembly, are not to be taken lightly— thfe objectives and 
commensurate testing program are at present in place — there is good 
reason to believe that ' the legislature was not fully aware of the 
implications of what it was doing. The committee had been advised 
primarily by on^^^egislative aide, untrained in .psychology or education. 
While that aide read a great deal of background r3search, and while the 
committee as a whole learned a great deal about testing, the final report 
.of the Joint Subcommittee is a melange of the Zeitgeist, :heory, errors and 
naivete. As paper after paper was delivered by the legislative aide to the 
committee, "We began," in the words of one committee member, "to 
conclude that he knew what he was talking about when in fact what was 
being created was'legislation by inundation— we were simply inundated with 
. papers about objectives and criterion-referenced tests and so forth." 

The aide had indeed "discovered" criterionj-referenced testing and 
proposed it as the only reasonable alternative to current testing practices. 
The following quote from the final report of the Joint Subcommittee is 
revealing of the errors and naivete alluded to above: ; 

Far too much emphasis in testing has been placed on how a group 
of students (a classroom, school, division, or state) compares ' 
relative to a "norm" group. Relative rankings bear no direct 
relationship to an absolute level of academic competency. 

Particularly with basic skills, knowledge is more absolute than 
relative. Thus, use of relative rankings or percentile scores 
. masks- any change in the jabsolute acquisition of skills or 
knowledge. The Educational Testing Service, which administers 
the College Entrance Examination has noted a steady decline 
over the last ten years in the absolute academic achievement of 
. ^ students taking its examination. "Norm referenced" tests do not 
show the decline that has actually taken place. 

Such delightful confusion could be accepted if it did not accompany a 
document proposing a program that no one had yet accomplished— getting 
all or most teachers to use tests, and tests,^ as meaningful,* not to say 
diagnostic, instruments! 
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The Department of Educatiop^s involvement in this policy change was 
nearly nil* Indeed, the Department had been operating independently on its 
own initiative. The Director of Program Evaluation had converted in 1975 a 
State Testing Committee made up of Local Education Agency (LEA), State 
Education Agency (SEA), and Institutions of Higher Education (IHE) 
c personnel to propose a comprehensive testing program for the state, Mjch, 
though not all, of their work was rendered moot by the actions of the 
legislature (the State Testing Committee did not make its final report until 
December 1976, some nine months after the action of the legislature). The 
Department apparently had no inkling of what the legislature was about to 
propose; the legislature, for its part, did not acknowledge the existence of 
the State Testing Committee, 

The effect of the new SOQ for 1976-78 was to strengthen an equation 
that had been in the making for Ihe preceding two ye^ars. That equation 
was simply evaluation = testing, ^ 

The new SOQ were passed in March of 1976, the Director ^of Program 
Evaluation resigned in August and the position remained vacant for nine 
months. When it was advertised again, the job description had changed 
considerably. Now called ,the Director of Program Evaluation and Testing, 
the new ad specified a person to provide: 

Leadership in developing a program for measuring individual 
student attainment of basic skills; developing a program for 
assessing student achievement; coordinating the work of staff 
members in developing, administering, and interpreting the 
^tate'wide testing program; and developing and-administering the- - 
testing program budget. 

Evaluation = school outcomes = test scores, A rather different 
orientation than in 1973, 

It is quite possible that all testing except that prescribed by the new 
SOQ would have gone by the boards in 1976 had not Virginia's intermittent 
policy making body, the press, jumped into the fray. In both articles and 
• editorials, newspapers, particularly thase in Richmond, the State Capitol, 
argued that the elimination of norm-referenced tests (NRTs) would lead to 
chaos as it had in other s'tates, California was cited as a state which had 
' changed tests so often that no one knew where the state was, what the 
anchor for scores was. The NRTs were kept. 



A CASE STUDY: THE EVOLUTION OF VIRGINIANS GRADUATION 
COMPETENCY TESTING PROGRAM 

If the history of the Standards of Quality provides a general 
framework for the evolution of the equation "testing = evaluation," the 
history oC Virginians movements in the area of "minimum comp^ptency 
testing" provides a concrete example, of that equation in action as well as 
how policy decisions are made and how programmatic decisions get 
. ' elevated to policy levels, ^ 

In a period of two and a half years, Virginia moved from having no 
competency requirements at all, through a stage of having an "Oregon 
plan" with localities having latitude, to having a uniform statewide 
graduation competency testing program, 

s 

t 
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In 1976, acting to head off a legislative mandate and in reaction to 
"the handwriting on the wall," the State Department through the State 
Board proposed that a set of competency requirements be added to the 
standards for .accrediting secondary schools, WhQe some- evaluative 
information . was considered— test scof^e declines as reported by the 
Educational Testing Service (ETS) and the National Assessment of 
Educational Progress (NAEP) and fluctations in the results of the State 
Assessment Program— there is no evidence that th%, decision to .add 
competency requirements was significantly influenced by these results. 
^ The requirements dealt with computation, communication, social 
studies and abQity to successfully pursue post-secondary experiences either 
in the rftarketplace or in higher education. The first three were adopted 
almost verbatim fnSm , a publication of the National Association of 
Secondary School Principals (NASSP) entitled This We Believe , . The fourth 
area was derived from the goals of fwblic edu» ition as stated in the 
Standards of Quality, 

In considering to whom -this new requirement should apply first, the 
Department and the Board were guided largely by an intuitive sense of fair 
play. The Department and the Board felt that the chQdren to be affected 
should have ample warning before hitting the barrier. In terms of the 
course of schooling, it made sense that the requirement could justly be 
imposed on those children who were about to enter the ninth grade. They 
would know about the requirement ahead of any high school years and have 
the full four years to meet the requirement, , 

The plan efdopted was referred to as the "Oregon plan" although it 
dtfrered^signif ic^^^ the Oregon program^ in one respect: While 

Oregon allowecS local divisions to establish the competency areas the State 
of Virginia specified them. However, as in Oregon, the determination of 
how to assess competency and how much of any competency constituted 
enough was left up to the localities. 

The localities, with little input into this requirement, gave the action 
mixed reviews and treated it with mixed rejections. Some divisions already 
had a similar requirement and .essentially ignored the Board's action,' 
Other divisions took the. new requirement seriously and began to plan in 
various ways to meet it. The most common form of planning was to 
develop or purchase a test, 

Almost immediately a large number of local divisions began to lobby 
the Board, the Department, and various legislators to move the 
competency assessment program to a statewide level, S^e lobbying 
efforts derived from legitimate concerns, some did not. Many divisions in 
Virginia are small and lack the staff or money 40 develop or even purchase 
assessment devices in four areas. Others, it appeared, simply did not want 
the. bother of the development process and some, seeing a fertile lode of 
future litigation, preferred that the state and not they be hauled into 
court. The pressures for statewide assessment of the competency areas 
grew during the fall of 1977 while legislative committees were meeting. 

When the legislature met in early 1978, an addition was proposed to 
the testing and measurement standard created by the legislature in 1976. 
As originally written, the lS78 standard would have required tests to be 
used both for graduation and /for promotion from grade to grade. This 
standard was debated hotly over a period of two months, but no member of 
the Research, Evaluation and Testing staff was called on to provide any 
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kina of testimony about the^ wisdom of the standard. And, unlike the 
original testing and measurement standard which had been preceded by a 
year-long, if somewhat flawed stjidy, no such study was conducted prior to 
the introduction of thjs new standard. 

The addition of a stuidard calling for a statewide graduation test was 
really no surprise, as the local superintendents had gone on record as 
favoring it. Calling for the use of tests to'determine promotion, however, 
was something of a shock. On what evaluative information was this statute 
proposed? One cannot be certain because of the lack of contact between 
the Testing Staff and tfie legislature, but it is reasonable to assume that 
the legislature was responding to what had become known as the 
"Greensville Experiment." To understand the Greensville Experiment and 
its power requires a digression, but it is an illuminative one. 

The original Standards of Quality had called for a division's 
achievement level to match its ability level, both levels being determined 
by NRTs.8 While most divisions had close correspondences between the 
two tests, Greensville County did not— the achievement level was well 
below the "ability." Based on these results, the State board singled out for 
public censure the County pf Greensville as failing to meet the Board's 
Standards. Greensville's response was to retain about 67 percent of the 
fourth graders and high percentages in grades 1, 2, ajid 3. All those who 
did not achieve at a certain level on the NRT were retained. While this 
caused a^ short-term furor v/ithin the ^!ounty, the superintendent was able 
to build public support for the program which involved more than simple 
retention based on test scores. Greensville's superintendent presented the 
program as offering low achieving children "more time to learn" and was 
able" to- conviTrce'^ia^sab'Stantr^^ that it had this 

benign intent, thereby removing most of the usual stigma associated with 
"flunking." The superintendent was able to convince the majority of his 
constituents that it would be unfair to promote the children who had not 
achieved a certain level on the tests; that it would in fact be more humane 
to allow them more time to learn these skills by repeating the grade than 
to go on and encounter even more difficult subject matters in higher 
grades. By 1977 favorable reports on Greensville had been written in 
various state newspapers. Time magazine, and shown in prime time on two 
of the three commercial televisiort networks. Greensville's program is 
complex and difficult to evaluate partly because of inadequate baseline 
data and partly because the need for administrative decisions which 
outstrip researchers' ability to gather data. Greensville is a small division 
with essentiafly a three-person central office. One can scarcely fault them 
for not being research oriented, for operating with an administrative style 
of deciding what needs to be done on the basis of the collective wisdom of 
the office and the school board and doing it. However, the general public 
and some legislators had gotten a simplistic notion of the program that said 
essentially, if you fail kids, scores go up. The fact that those children with 
low scores were being retained and that this selectivity alone would make 
scores appear to increase was not a part of this simplistic conception. 
Part, but only part, of what Greensville County h^d done was to test third 
graders locally and retain those scoring below a. certain level. Thus yvhen 
the succeeding fourth graders took the state required test at that grade, 
the fourth grade scores appeared to rise dram2^tically.*' Other increases 
occurred in other grades. 



26 GERALD W, BRACEY" 

7e/ the school year of 1977,- however/the chQdrerrretained in the lower 
grad^ for one or more years were now 14 years old. ^4^ey were also, in 
many cases, in ungraded classes. However, for purposes of funding .by the 
state the children had to be declared as either eighth graders or special 
education students, By declaring them as eighth graders they would be 
eligible for intermural athletics and they were so declared, Byt' being now 
eighth gcaders, they came uq^^r the law that required all eighth graders to \ 
take the state NRT, The result was that the scores that were apparently 
high, took an equally apparent nose ' dive, " A legislative resolution 
commendinp; the superintendent of >GreensviIle for his efforts was tabled 
and the promotion-by-test section of 'the proposed new standard was 
deleted. Such is the relationship between test data and policy. The 
standard calling for a graduation test was retained, however, and in its 
final form read as follows: 

It is the policy of the Commonwealth that the awarding of a high 
school diplomat shall be based upon achievement. In order to 
receive a high school diploma from an accredited secondary 
school after January 1, 1981, students shall earn the number of 
units of credit prescribed by the Board of Education and attain 
minimum competencies prescribed by the Board of Education, 
Attainment of such competencies shall be demonstrated by 
means of a test prescribed by the Board of Education, 

Certain characteristics of this policy statement and its concomitant 
action requirement -stand out. The policy is in one sense extremely 
prescriptive in that a test is required. In another sense, the policy is 
extremely liberal in that no areas are defined for the testing. Under the 
letter of the law, the State Board is free to prescribe a test in the basic 
postures of hatha yoga. In one sense the policy is extremely vague in that 
it does not define test. And a number of people were concerned that 
because the standard referred to "a test" the Board was not free to 
prescribe different tests in more than one area. 

Needless^ to say, hatha yoga is not "a requirement for graduation, and 
test has been interpreted as four-choice multiple choice. The Board 
decided that they could require more than one test— reading and 
mathejnatics— and no legislator has complained that this violates either the 
letter or the spirit of the standard. That such concerns about wording 
could be raised and discussed with a* semblance of seriousness °isMndicative 
of the sometimes fragile relationship among policy making bodies. 

While the legislature altered the policy of how competency was to be 
demonstrated, it left the yeaf when the statute became effective 
unchanged. The ciass of 1981 was (and is) under the gun. Again, no one 
from the Testing Staff of the Department was asked to testify concerning 
the effect of these policy and action changes. At this point, no litigation 
had been resolved in Florida, and McClung had only recently (1978) 
suggested that the length of phase-in time for such a program could be a 
source of litigation. The Department felt strongly, however, that the 
children to be affected should be tested as early as possible. Early testing 
would allow for remediation and to prevent any feeling that the rules of 
the .game for getting a diploma had undergone a sudden and capricious 
change in the eleventh hour (or the eleventh grade). Accordingly, the 
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Department convened two committees of reatiing and mathematics experts 
for the purpose of reviewing tests in these areas and recommending to the 
Board which should be used. 

It should be noted in passing that the calling of these committees 
reflected a long-standing ' Department policy. Whenever changes in 
educational programs are likely to have significant impact on local school 
divisions, committees representing different regions or different levels of 
organizational status, or both, wUl likely be convened to "evaluate'' the 
change and advise the Department and Board. In my experience, such 
committees function substantively, not as window dressing. 

the twd committees consisted of teachers, supervisory staff, 
university faculty and representatives of various special interest 
organizations. Their charge was to review all tests that had been 
developed in Virginia under the "Oregon plan" years, as well as those tests 
that were coming onto the market from commercial publishers. Two tests 
were recommended to the Board and accepted by it in June, 1978. 

)Vith the tests chosen, another question arose: How much competency 
on these tests is enough? Where is the cut-score? This issue might seem 
at first to be too minor to constitute policy. On the other hand, it 
certainly meets all of the criteria for a policy issue as described by Mann 
and cited in the introduction to this chapter. It is a public problem, with 
important consequences, with political, economic and moral dimensions, 
involving a goodly amount of uncertainty and viewed very differently by 
different interests. In any case, the amount M heat generated by the 
debate elevated the 'decision to the policy level. 

Most of us directly concerned with testing were largely in agreement 
with Gene Glass (1977) that cut-score '^decisions could not be based solely 
on technical considerations. We were, likewise,'^ in agreement with Glass 
that: ' ' 

« 

For most skills and performances-, one can reasonably imagine a 
continuum stretching from "absence of skill" to "conspicuous 
* excellence." But it does not f^ollow from the ability to recognize 
absence of the skUl . . . that one can recognize the highest level 
of skUl below which the person wUl not be able to succeed (in 
life, at the next level, of schooling, or in his chosen trade) . . . 
Imagine that someone would dare to specify the highest level of 
reading performance below which no person could succeed in life 
as a parent. Counter examples could be supplied in abundance of 
persons whose reading performance is below the "minimal" level 
yet yvho are regarded as successful parents (pp. 237-261). 

On the other hand, a cut^core was /necessary "and thus the soarch had 
to be confined to methods that would reduce arbitrariness in its bad sense 
of being capricious. Several methods | were considered and that which 
seemed most congenial to us was the method proposed by Jaeger (1978). 
WhUe openly judgmental, the procedure allows for judgments , by various 
audiences and a series of iterations before reaching a final decision, which 
allows judges to change their minds based on new information. However, 
one of the cycles o^the'model requires that data from item field tests be 
provided to judges' that they may know how many chUdren actually 
answered a given/item correctly. This cycle seems particularly important 
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in view of other sik/iilar models which operate without actual data and 
often lead to unrealistically bigh scores (e,g, procedures for validating the 
National Teacher's,Examinations), 

However, with the legislation enacted in ''March and testing scheduled 
for November, clearly no field test data was going to be forthcoming, and 
the two tests -chosen had limited field testing, and none in Virginia, In June 
of 1978, the'testing staff recommended that the setting of the cut-score be 
delayed until actual test results were back and that the results from the 
first administration be used in the Jaeger model where data were called for. 

No action one way or another was taken on this recommendation for 
some months. In late August or early September the press became 
cognizant that no cut-score Would be ^tablished until results were in, and 
in several cities allowed as how such a procedure would permit the scores 
to be "tampered with" to make the besults both politically and 
economically acceptable to the "educationist establishment," In reaction 
to these allegations in the press, the State Board attempted at its 
Septembel'^meeting to set a cut-score in the absence of any procedures, 
"Let's set a score and let the chips fall where they may," said the President 
of the Board, "We can always change the score if we need to," I and two 
other members of the 'department argued vehemently against this 
approach, and after several heated exchanges, won a month's delay to 
conduct some kind of study only by promising to provide a recommended 
score at the next Board meeting. During the month, seven groups of about 
15 people each were convened in various divisions of the state at the 
invitation of the local superintendent of the division. The instructions to 
the superintendents were to pick people lepresenting professional 
educators of all. levels, and parents and other interested community 
members. Each item was examined in terms of its importance and then a 
global rating was obtained as to what would be a fair passing score. The 
range of scores was 35 to 85, When analyzed according to a lay-educator 
dicliotomy, lay persons wanted a score around 75, educators around 70, 
The Department recommended 70, which was accepted by the Board, 

It is worth noting that feelings among the testing staff were so high in 
opposition to this haphazard, approach that others in the Department 
administration '^absolvetf' us from any responsibility in the conduct of the 
modified, less-than-rigorous cut-score study. In retrospect, I feel that this 
cut-score was predetermined, that it was not likely to have been anything 
other than 70 (the judgment of the groups actually favored a score of 75 
and had the testing staff been in charge of the recommendation, that would 
no doubt have been the recommended score; if the testing staff had 
conducted the study it is not clear where the cut-score would have fallen). 

Again, in retrospect, it was good that the cut-score was in fact 
established before the test was given. Even though in the actual.procedure 
political considerations and public relations considerations weighed nriore 
heavily than conceptual soundness, a technically sound study conducted 
before the test administration might have led to more '"disruption and 
dislocation" in Glass' . phrase, and a cut-score set after the test 
administration certainly would have maximized disruption and dislocation. 
This unfortunate outcome would have been determined by events 
surrounding the test administration that^could not have been foreseen prior 
to that administration. 

The competency testing program had been good copy for the media. 
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Both print and electronic media representatives had been keeping close 
tabs on events surrounding the competency program. When the test was 
actually administered, schools were deluged with reporters interviewing 
students for their reactions. The tests, said, the students without 
exception, were "a piece of cake" and "an insult to. my intelligence," etc. 
Apparently no reporter questioned whether or not this sample might be 
biased— that children who 4xperienced difficulty would either steer clear of 
the cameras or be loath to admit hardship in the presence of peers sneering 
at the ease of the test. In at least one instance we obtained reports that 
only the "right'' students were being aimed in the direction of the cameras 
by the school staff. 

In addition, approximately a month after the test administration and 
before any results were received, sample items of both tests were released 
at a news conference. These items did little to convince the press that the 
test had been difficult, and one paper published the items with the first of 
a series of vitriolic editorials attacking the test as a farce and public 
education as a sham, bilking the public by permitting children to graduate 
knowing so little. 

One can only imagine what would have happened in this charged 
atmosphere if the cut-score had been set after all this publicity. It is 
likely that any group charged with establishing a passing score would have 
felt obligated to set the score higher than what had been established prior 
to the administration. - With the score set at 70, 9% of all white students 
and 33% of all black students failed the reading test. . If the scOre had 
been, say, 80, these figures would have riseh to 25% and 63%, respectively. 

Before leaving this particular topic, it should be noted that while the 
data preferred by the testing staff could not be gathered, some information 
was gathered for use in establishing a cut-score. Some 100 people did 
render judgments about the appropriate cut-score. The extent to which 
this data influenced the cut-score or which it "improved the decision 
making process" is undeterminable. What is clear is that this Is the type of 
information that will often have to be used in informing policy makers. 
Because of time pressures, political, economic and public relations 
considerations, it is likely that the need for a decision will usually outstrip 
the professionally desirable methods for collecting information. If 
evaluators are not simply going to take their marbles and go home (thereby 
rendering their contributions nil), they must learn to cope with and use less 
than "pure" data. The techniques for such coping and use anQ the criteria 
for evaluating the power of such data are by no means clear, although the 
exponential increase in evaluation methodologies— everything now seems to 
be a metaphor for evaluation— is testimony that evaluators are at least 
aware of the problems. 

With a cut-score set, the tests given, the question now arose as to how 
to release the data. For many years test scores were not released, but in 
1971, the State Board, reacting to both public pressures for such scores on 
a division-by-division basis and an opinion of the Attorney General, decided 
to make such release a matter of course. In spite of an attempt to defuse 
invidious comparisons among the divisions by emphasizing the desired 
match between "ability" and "achievement" (noted earlier), this practice 
did not sit wen with many legislators, as noted in the statements from the 
Joint Subcommittee report. Nevertheless, the Board's decision was not 
challenged and the Department planned a release in this .fashion for the 
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Graduation Tests. In addition, because of results already reported in 
Florida and North Carolina showing marked racial differences, it was 
decided to report the data analyzed by these two categories. The results 
were accompanied by a brief summary-interpretation prepared by the 
testing staff. 

The racial differences in passing rates made big headlines around the 
state.. In addition* a number of divisions that do not traditionally do well 
on the Tiocm-ref erenced tests did very well on the Graduation Test, much 
better then neighboring divisions jwith similar NRT scores. This led in some 
instances to phone calls to the Department questioning the validity of the 
results for the high scoring divisions and in a^few cases to. open charges 
that some divisions had somehow cheated— had^ taught to the test, withheld 
certain percentages of their kids who should have been tested, etc. 

Such accusations place the Department in a difficult position. It lacks 
the resources to administer' the tests and certainly cannot verify the 
challenges of impropriety. The best that the Department can-do is to 
prepare a narrative to defuse (these kinds of^ attributions. 'However, this 
course has its own dangers. I| the narrative is too long and makes too 
many points, the media accuse the Department of managing the data and 
ignore the narrative^ , 

If the narrative contains information which the Department feels is 
important but which is complex— which cannot be dealt with in a few 
paragraphs or during a 60-second segment on the evening news— such a 
n ar rati ve^is also likely to be ignored.- For example, while the failure rate 
for blacks was four times that of whites statewide, this rate did not prevail, 
in ali divisions. Indeed, in a few divisions, blacks passed at a higher rate 
than whites, and an examination of those divisions where blacks performed 
well indicated that the results could not be a simple function of 
demographic variables, socio-economic status (SES) or anything simple. In 
all likelihood they reflect subtle program and extra-school variables. But 
subltety has no place in the face of deadlines and short on-camera reports, 
although the variations discussed above were reported in the narrative. 

The point of this discussion is that evaluators in possession of "public 
information" oftlen have a difficult time getting that information to the 
public in usable form. The Department is often forced to do a "data dump" 
and hope for the best. Presentation of what the Department considers 
important is o^ten viewed with suspicion as being that which the 
Department cbnsiders to be in its best interest. 

I would note in conclusion, however, that, in fact, there is never really 
a "data' dump" but only a method and format of reporting certain 
information to which the press and others have become accustomed. It has 
been noted by numerous philosophers of science that "data" should really be 
called "capta"— that nothing is given but rather is taken, captured. The^ 
problem for evaluators is to obtain acceptance for the kinds of capta they^ 
consider relevant,jnot simply those which may be relevant to the press. 
1 

1 . . 

possdbiIe futures for evaluative information 

« 1 

In the preceding sections of this paper, we have seen that the 
influence of evalujation on policy in Virginia has been a very limited one, 
having been constrained by the structural context, as well as historical 
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events. The structural, constraints were two-fold: an absence of 
evaluation staff and strictures on the flow of information which would 
make i^ difficult to get relevant information to those making policy 
'decisions. The Zeitgeist has also constrained the influence of evaluation m 
two ways. People have not seen' evaluation as particularly relevant or 
important to policy except in a negative way. Secondly, the spirit of the 
times says that, in fact, the evaluation data are already in: Test scores are 
declining, costs are going lip, teachers cannot teach and the whole 
structure needs revamping. This aspect of the Zeitgeist clearly includes 
the general public, as can be seen in the annual Gallup polls on public 
opinion of education, as well as the policy makers, as can be seen from the 
number of legislated educational programs. 

In Virginia, one senses, though it is difficult to demonstrate with 
"hard" evidence, a punitive, Calvinistic attitude towards the schools. The 
schools have failed to control the natural depravation of mankind and must 
be punished for their sjx)rtcoming. It is never phrased that way, of course, 
but the behavior speaks louder than the words. I have often been asked by 
people outside of Virginia what remedial programs the state is providing 
for those who fail the Graduation Competency Test. My answer is none, 
there is no money for remedial programs. When the same enquirers have 
asked, 'Ts this not a little unfair," my reply is this: Many who could provide 
funding believe that if the schools were doing their job there woulcbi't be 
anyone failing the test and so no additional money will be forthcoming to 
assist what should be done anyway. Not all policy makers feel this way, 
but enough do to prevent any sums being appropriated for such programs or 
for legislation being introduced to provide money.^ 

Similarly, the Standards of Quality can be assessed in terms of their 
per--pupil cost. That is, how much money on the average does it cost to 
provide the programs required by the Standards of Quality. According to 
data collected at the local level and state level, the standards have 
consistently required more money than* the legislature hais appropriated for 
them. This failure to appropriate actual costs for constitutionally required 
standards, largely written by the legislature, reflects again a certain 
attitude of punitiveness towards public education. 

As another instance of constraints on the use of evaluation at the 
'state level in Virginia, let us consider a situation in no way unique to 
Virginia, but rather symptomatic of certain organizational structures. 
Evaluation, to be used properly, must take place in an .atmosphere where 
there is some freedom to fail. If one is not doing pseudo-evaluations where 
the results are determined beforehand, one cannot guarantee the outcome 
of an evaluation. Such freedom does not exist in education and such 
freedom does not usually exist in bureaucratic structures competing with 
other such structures for money and power. Rich (1979) distinguishes two 
different ways of "avoiding risk" in organizations. For scientifically 
oriented acac'emic researchers, reduction of risk consists of, and in the 
ideal requires, new information which may contradict earlier information. 
The goal is truth and the new information reduces the risk of being wrong. 
While I think Rich's description of academia's open arms acceptance of hew 
Information is self-serving and overstated,^" certainly there is more 
freedom to fail in academic settings. . 

Rich is more on target when he speaks of the Manager's Perspective of 
Risk Avoidance. Here risk relates to competition for scarce resources. 
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The manager in a bureaucracy is likely to ask how new information might 
be used to embarrass him or to place another agency in a more favorable 
light. Evaluation in this context does not have a "truth" orientation but a 
political one. Evaluation will be well received if it helps the organization 
and manager meet goals, minimize costs, and maximize gain. This means 
that evaluation, as customarily defined, has an uphill fight— it takes place 
J in a value-laden political context. Education is in an area of competition 
for scarce resources and in an era of negative opinion towards its 
achievements. In such an area in such an era the cost of looking bad is too 
high to permit a scientifically detached perspective on evaluation. 

Finally, the role of evaluation in Virginia is constrained because it has 
been narrowly defined, namely 'by test scores. It is safe to say that until 
test scores cease to be an area of concern, such potentially fruitful sources 
of evaluation as affective variiables or process variables will be ignored. 
Similarly, until test scores per se become a nonissue, little will be done in 
the way of process or formative evaluation. 

It would be easy after reading the above paragraphs to conclude that 
evaluatjpj is unlikely to ever be- usefully applied in Virginia through the 
Department of Education. Easy, but wrong. While events do not augur 
well, there are areas where imaginative and energetic use of data 6md 
lobbying by those interested in evaluation can'produce'results. 

The present press for accountability in education is likely to increase. 
The problem for evaluators is to eliminate the equation of accountability 
with test^ scores and write a set of equations including many variables. 
While th^re is^ no guarantee that such a conceptual broadening can occur or 
be used properly, there is at least one area of accountability where it 
might. The focus on competency in Virginia has moved from students to 
teachers and while there is, yet, a test required for new teachers, all 
concerned recognized that such a test will not guarcmtee anything except 
the screening out of near-illiterates. There is a recognition that knowledge 
and competence are different and that competence is related to behavior. 
This is not a recognition well articQlated yet, but it is there and the 
necessity to move towards assessing teacher performance would seem to at 
least open the door to a broader conceptualization of evaluation. 

Similarly, evaluations could take policy issues and relate them to 
empirical research in such a way as to broaden perspectives and hopefully 
open the way for better utilization of information. For example^ class size 
has been a policy issue in Virginia for some years. There has been a 
mandated reduction in class size as a part of the SOQ for some years. The 
mandate occurred without data. Now, Glass and Smith and others have 
provided some widely accepted research findings relating class size to both 
achievement (Glass and Smith, 1979) and affective outcomes (Smith fimd 
Glass, 1979). The key, it would seem, would be to identify^ those policy 
areas that already are issues and try and provide as much evaluation 
evidence as possible even if the evidence cannot be collected by the state 
department. 

The job description for my position has undergone considerable 
revision from that given earlier and puts emphasis on evaluation. As noted 
before, however, this titular emphasis has not been backed ^up with a staff 
adequate to the job responsibilities. 

Evaluation related information has come into prominence in recent 
years— even if it has not always been used properly. A decade ago, no one 
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in Virginia was particularly concerned about assessment. Now tests are 
• everywhere. A thorough history of how test use grew in kudzu-fashion is 
beyond the scdpe of this paper, but such a history might provide some clues 
about how to build interest both in other means of assessing outcomes and 
assessing other outcomes. 

Finally, those interested in evaluation must just keep pushing 
evaluation as an important and useful activity. In the two and one half 
years that I have been with the Department, workshops have been held both 
for persons in the field and in the Department on techniques of evaluating 
projects and prop<^als. Each time a new program activity is proposed, the 
question is asked (by me) how are you^^going to evaluate it? The extent to 
which this awareness building activity has been productive is not fully 
known, but there are encouraging signs that evaluation is being considered 
more as an integral part of programs from their inception. This is 
particularly true in formative evaluation where the information is used 
more as a guideline for further action than as summative judgment, and 
hence maximizes the risk avoidance necessary in a bureaucracy. 

In conclusion, it should be noted that "evaluation" is also changing, A 
decade ago, evaluation was conceived largely in terms of the laboratory 
.model of research described by Fox (1977). The inadequacy of this model 
has been widely recognized and there are now. a plethpra of papers drawing 
from fields other than psychology and education to prescribe techniques for 
educational and psychological evaluation. There has been, similarly, a 
recognition by many that evaluation does occur in an environment where 
power, prestige, and economics often have a higher value than "scientific 
rationality." The degree to which changes in evfeiluation will produce 
models more relevant to more audiences is not clear. The degree to which 
evaluatops will be willing to participate in "impure" research, the point at 
which they wiU feel that their hands are" too politically dirty is also 
unclear. One must hope that the activities of evaluators will produce a 
better match betwe^i policy and results than now exists, and work to that 
end. 
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FOOTNOTES 

^Cautionary note: While readers may experience difficulty with this 
chapter for a number of reasons solely the fault of the author, readers- 
addicted to the words "implement" and "impact" will experience additional 
difficulties as these words occur nowhere in the text in either noun, verb, 
or adjective form. ^ 

^V/hile personal experience does not constitute "evaluation," an 
evaluator with an interest in influencing policy would be well advised 'to 
see that his information/became a part of the "personal experience" of the 
policy maker. Constraints on evaluatbrs doing th^ will be discussed later. 

^These categories are: objectives based studies, accountability 
studies, experimental research studies, testing programs, management 
information systems, accreditation/certification studies, policy studies, 
decision-oriented studies, consumer oriented studies, client-oriented 
studies, and connoisseur based studies. -Although Stufflebeam and Webster^ 
present them as "types", the author does not presume them to be mutually 
exclusive. ^ _ — 

"^These are phrases from Berlak and do not violate the promise of 
Footnote No. 1. 

^The Superintendent from 1975-78 ^ was previously a local 
superintendent of the only division in the State ''that did not have a Title I 
program. 

^Virginia operates most of its programs on the basis of biennial 
plans. 

^There is some anecdotal .evidence that some people did not really 
take the requirement "seriously" until after the first results of the later 
statewide tests were released in early 1979. 

^In fact, as phrased, the standard did not have any meaning. The 
int^pretation given to the standard was that the average percentile rank 
for a division in achievement should be equal to or above its average 
percentile rank for ability. 

^After the first draft of this paper was written, a part of the 
1980-82 Standards of Quality was changed to permit the hiring of extra 
personnel for eighth-graders scoring three or more years below grade level 
on the eighth-grade norm-referenced tests given annually. These children 
are known to be at some risk in terms of passing the Graduation 
Competency Test. The change thus has the effect of providing remedial 
assistance although it is not phrased that way. This kind of indirect use of 
evaluation information is becoming more common in Virginia. 
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^'^The case of acupuncture vs. the American Medical Establishment 
comfes immediately to mind. Rich's statement might be better phrased to 
the effect that new information is welcome in direct proportion to its 
potential for getting grants and publications for the receiver of the 
information. Resistance to change in the scientific community has been 
beautifully and amply documented in Thomas Kuhn's The Structure of 
Scientific Revolutions. 
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CHAPTER 2 



The Michigan Experience 

David L. Donovan & Stanley A. Rumbaugh 



The-last^fifteen years have been a period of transition in Michigan 
education. Historically, the "-governance of education was delegated to 
local boards of education. The State Constitution, statutes, and 
regulations tended to provide wide parameters for local programs and 
policy making. The Department of Education initiated little policy and was 
often criticized as a "do nothing" agency. 

The early 1960s set the stage for change. Prior to 1964 the State 
Board of Education consisted of three members, plus the Superintendent of 
Public Instruction. The members, including the Superintendent, were 
elected at a biennial spring election. Since voter turnout at the spring 
election was usually small, educator organizations found it easy to 
influence the election of persons friendly to the concept of local control of 
education. The constitutional authority of the Superintendent was stated 
as ". . . shall have general supervision of public instruction in the state . . ." 
and, "...duties and compensation shall be prescribed by law."l The 
explicit authority of the Board was, ". . . shall have general supervision of 
the state normal college and state normal schools, and the duties shall be 
prescribed by law."2 The State Superintendent and Board gave direction 
to a relatively small Department of about one hundred thirty professional 
and clerical employees. The authority was weak and the resources needed 
to govern a state educational system of over 700 school districts providing 
instruction to nearly two million students were inadequate. Thus, few 
policy initiatives errtanated from the state to give direction to Michigan 
education. 

The mid-sixties brought together several changes in thought, and 
several events in Michigan and the nation, to produce a different State 
Board and Department. The basic change was incorporated in a new State 
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Constitution adopted in 1964. It redefined the role at the State Board of 
Education, and changed the election process. 

The. 1964 Constitution established an eight-member board with 
candidates nominated at party conventions and elected at the regular fall 
biennial elections. Terms of office were made eight years. The 
membership of the Board was completely changed. A more subtle change 
than the election process and membership w€is the attitudes df the new 
members. They tended to have constituencies beyond education, to have 
social concerns beyond education, and to have political ambitions beyond 
the State Board of Education. 

The new Constitution expanded the role and authority of the State 
Board of Education; The State Board duties were to be, "leadership and 
general supervision over all public education, including adult education and 
instructional- programs in state institutions, except as to instituticns of 
higher education granting baccaleaureate degrees, is vested in a state 
board of education. It shall serve as the general planning and coordinating 
body for all public education, including higher education, and shall advise 
the legislature, as to the financial requirements in connection 
therewith."^ The persuasion of the Board was to develop- a program of 
"verall supervision of education and to initiate policies in keeping with 
heir leadership mandate. The State Board was the policy bofird for the 
epartment of Education. The State Superintendent was both the 
Chairperson of the Board and the Chief Administrative Officer of the 
Department. The Department of the past had to be changed to be 
responsive to the active role the Boarb wanted, and to the new world of 
education. 

THE DEPARTMENT OF EDUCATION 

* ./ ' 

The events of trie time provided a setting. A collectiviB. bargaining 
statute was .enacted, and old power structures were being altered. The 
courts increasingly entered into educational matters. Educational issues 
were often being identified and defined outside the educational community; 
there' was a need for strong leadership in Michigan education. 

The passage of the Elementary^and Secondary Education Act in 1965 
(ESEA) not only increased the Federal presence in education, it increased 
the role of state agencies in education. Most of the federal ESEA 
programs flowed, procfram money through the state and provided 
administrative funds to the state. The resources available to the State 
Board and Superintendent were increasing and with it their ability to 
initiate policy was increasing. 

As State Board 'members began to address their new responsibilities, ^ 
an obvious questicMi was posed, how "healthy*' is Michigan educ^tion?^ 
Answering the question was more difficult than expected. Although the 
Department collected some financial and staffing data and issued a few 
statistical reports, virtually no evaluation activities existed. No effort was 
made to gather and analyze a broad range of information about the schools 
and districts of the state. Certainly there - was no effort to identify 
inadequacies, inefficiencies, and inequities in the' system. This paucity of 
information presented the policy makers a dilemma, a desire to provide the 
leadership for educational imppovement, but no baSe of information about 
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what changes, were needed mo^. There were beliefs that some districts 
supported education at a mu^h higher level than others, that learning levels 
were disparate, and that the conditions for educating were vastly different 
throughout the state. 

Members of the State Board were anxious to perform in their 
leadership role and. were not willing to wait until the Department could 
develop an evaluation capability and produce the information they needed. 
Rath^ the State Board, with ^^upport frpm the Legislature, contracted 
with Dr. J. Allan Thomas of tfie University of Chicago to do a thorough' 
study of Michigan education. His charge was to gather together ^ 
information on the system, to describe the system, and to offer 
recommendations for improvement. The study took eighteen months and 
•culminated with a report issued in the fall, 1967. The report drew several 
conclusion, among them: 

1. J There is a great variation in the educational opportunities 
^ available to students in the State of Michigan (Thomas, 

1968, p. 321) and ' 

2. The Michigan rState Department of Education should expand 
* and strengthen the Bureau of Research, Planning and 

Development (Thomas, 1968, p. 345). 

The report was well accepted and thoroughly read by those interested 
in Michigan education. The report was a good base from which to s^t a 
direction. The 'goals of the 1970s were to be greater equity and equality in. 
Michigan education, and evaluation, in the broadest sense, was to provide 
the leverage for the changes. * . , 

' INITIATING" STATE ASSESSMENT 

Acting on the recommendations of the "Thomas Study," the State 
Superintendent, Dr. Ira Polley, reorganized and enhanced the evaluation 
capabilities of the Department by creating the Bureau of Research within 
the Department. Staff for the new Bureau was hired from bright, recent 
Ph.D. graduates of universities like Chicago, Columbia, Illinois, Oregon, 
and Michigan. These persons not only brought nejv and different skills to 
the Department, but also the commitment to use these skills for 
.educational improvement. .Nearly^ as soon as the small staff of four or five 
were hin^d, theVbegart tb discuss the lack of reliable information about the 
status and progress of educational achievement in Michigan; This small 
' group was familiar with, and intrigued by, the research and writings of such 
men as Benson, Fox^ Holland, Coleman, Thomas, Levin, Bowles, and others 
(Kearney, 1970, p. 5). The group generally embraced the input- 
process-output research model used by many of these investigators and saw 
as important to the state agency the answer to the question: "What are the 
correlates of educational success?^^ (Wilbur, 1970). Thus, staff discussions 
led to the development of a paper which suggested a statewide assessment 
effort to determine the status and progress of basic skills achievement and 
factors related to it^ The, paper was shared with the State Superintendent 
who was very positive and asked for alternative strategies for 
implementing the idea, , . 
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As the Superintendent and staff were considering the statewide 
assessment of achievement, the State Board was looking for ways to get' 
information on the quality of education in Michigan. They were interested 
in the accreditation function then being fulfilled by -the University of 
Michigan.4 At its meeting of January 15, 1969, the Board discussed the 
accreditatidn process. It was obvious from information v;hich came to the 
Board that taking over accreditation could not be easily s^pcomplished' . • • 
the University wanted to keep it. More impbrtantly, the lack of any 
demonstrated relationship between accreditation and- achievement was a 
major concern. Thus, when Dr, Polley introduced the assessment idea as an 
alternative, it was well accepted. Staff were asked to provide plans for a 
statewi<Je assessment for the State Board's review; Proposals were placed 
before the Board in January,^ February,^ and- April* and were 
thoroughly discussed and revised. In April the Board passed a resolution 
which directed' the State Superintendent to seek legislation that would 
provide the authority and funding needed to carry out an assessment in 
1969-70, and to do long-range planning for a more comprehensive program. 
The Board emphasized that the basic skills assessment should also include 
information about the conditions under which ^e schools operated. 

The legislature during the session had three other evaluation, 
assessment, or statewide ' testing bills introduced. In addition, the 
Governor's "Blue Ribbon" panel was about to recommend some kind of state 
assessment as part of the education reform p'ackage. While the task of 
getting the authority and funding for the assessment was not easy, the 
timing was right for approval." The State Superintendent was successful in 
gaining legislative support, and the assessment program was added to the 
Department appropriation bill for 1969.^ 

The Governor signed the bill in August, 1969. The mandate was to 
* administer a statewide ^assessment of the basic skills prior to January 31, 
1970. Staff began immediately to plan for an assessment which would yield 
reliable data on reading, English usage, and mathematics skill levels for 
Michigan school districts and provide *an indicator of the level of basic 
skills achievement among the districts so that the disparities could be 
described and policies considered to address the problem areas. For the 
first time the state agency would be gathering information about the levels 
of achievement in the school districts of the state. The State Board would 
have information about the system it was to supervise. 



• IMPLEMENTING STATE ASSESSMENT 

The State Board in cooperation with the Legislature and Governor, had 
taken a major step by obtaining approval for a state assessment of the 
basic skills. However, before the first assessment was done in Januafry, 
1970, there was a changing of the guard. Ira Polley resigned as State 
Superintendent and was replaced by John W. Porter. In making this c|Soice 
the Board made a commitment to a pro-active and highly visable role for 
evaluation. Porter brought a philosophical commitment to use of pata in 
the management of the educational enterprise at both the state ajfid local 
levels. To Porter, evaluation was critical to managers. He was to define 
educational evaluation as, "a process of obtaining, for decisiorf making 
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purposes, information concerning educational activities," (Porter, 1972, 
p. 3) and emphasized his commitment by saying, . . we are committed to 
developing educational evaluation into a fruitful and productive exercise. 
We, in Michigan, are not content to treat evaluation as that useless 
exercise required from on high that takes time and pain to^produce but 
'which has very little significance for action." (Porter, 1972, p, 3) Porter, 
as State Superintendent during the 1970s, was to be the driving force 
behind state efforts in evaluation, and personally used the data provided to 
him. 

Porter took offfce in October, 1969. The first state assessment was 
conducted the following January, 1970. The 1970 administration included 
the collection of data on student achievement "as previously noted, but also 
included data on the socio-economic levels of the schools and district and 
general pupil attitudes. This was accomplished by administering a "General 
Information" questionnaire which contained twenty-six questions. .Students 
responded to the questions anonymously. The purpose of the questionnaire 
was to provide the information . needed to estimate the group 
socio-economic status, and the pupil attitudes toward self and school for 
each school and district. This was seen as necessary information to 
describe the conditions of education for the State Board, and for comparing 
group test results from year to year. ' . . 

Some groups saw the questions in the "General Information" part of 
the assessment as unrelated to the purpose of assessment of the basic 
skais, and even worse, an invasion of personal privacy because of the 
questions asked as proxies for .socio-economic status. The press picked up 
the complaints of educators and parents, and then legislators got into the 
debate. Department staff spent considerable time and effort explaining 
the need for these data, and defending their collection. Finally, as time 
passed and other issues arose, the controversy abated, but was to re-arise 
each year until the State Board in 1973 ^directed the State Superintendent 
to eliminate the socio-economic status (SES) feature. It was recognized 
that these data were valuable for the proper analysis of the basic skills 
assessment data, but it was just not politically viable to keep this 
instrument as part of the program. The policy decision to eliminate it was 
made on political rather than on technical grounds. At the same time the 
SES feature was eliminated, 'the questions used for constructing the 
attitude scales were eliminated because of technical deficiencies. 
Although these were corrected a year later, 1974, they were never 
reintroduced to the program. ' ' - 

Another controversy the first year was raised by legislators at the 
request of their constituents. They attacked one of the reading passages in 
thenest because-it was "a- blatant attempt to inculcate anti- American and 
anti-free enterprise values in school children."^ The Department staff 
discussed these issues with the legislators and were able to avoid serious 
action against the assessment program. The compromise solution included 
changing the reading passage for the next year. 

The time between passage of the assessment legislation and its first 
administration was. only four months. The short timeline did not allow the 
development of the long-range plan for assessment. The lack of a plan 
produced uncertainty and distrust among local educators over the ultimate 
purposes and uses of the tests and the data they would yield. This 
opposition was further stimulated because the program was new, it was 
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championed by a new State Superintendent with whom they had little prior 
experience, and it was an intrusion by the state into local educational 
autonomy. The St^te Board and Department were seen as pushing out to 
exercise new authority and were using the assessment program as a 
•vehicle. The program became the focal point^f opposition. _ ^ 

The results' of the first assessment were "senl To local scTiool~^istricts 
without fanfare. There was some interest from the press in reporting 
"scores" of local districts, but the Department deferred requests to local 
districts, or were able to convince the media peopleTiot to report them* 

The Department used the results for two major purposes, (l) analysis 
of the correlates of educational success, emd (2) as one of the criteria to 
determine school district eligibility for state compensatory education 
funds. The analyses to identify correlates of school success confirmed 
other such studies; the correlations between mean achievement as 
measured by the test and percent minority students were about .SO^O ^and 
achievement and mean socio-economic status of students were about 
.60.11 The analyses were disappointing because all other correlations 
between composite achievement and expenditures^c staff training, salaries, 
size of district, etc., were less than ,20.12 Again, the variables in 
control of the school managers were disappointing. The Department was to 
repeat the studies the next year with the same results, and then dropped 
such analyses from the program. 

The second major state use of the results was as one of the criteria for 
determining school district eligibility for funds under the state 
compensatory education program. Eligibility for fiinds had previously been 
determined on the basis of socio-economic indicators (e.g,, similar to the 
ESEA Title I use of aid to dependent children, family income, etc, as 
indicators). There was a strong feeling among Department staff led by 
John Porter that eligibility should be more directly determined by a 
measure of "educational deprivation," i,e., low basic skills achievement. A 
position paper which advocated the use of mean district scores on the state 
assessment as one of the criteria for eligibility was developed and was 
adopted by the State Board of Education. The idea was well received in 
the legislature, and the state compensatory education program (Chapter 3 
of the State School Aid Act) was amended for 1971. The policy makers 
strongly believed the direct achievement measure was a better criterion 
for directing funds to alleviate low achievement problems. Later, Chapter 
3 was amended to make results on the state assessment the sole criterion 
for district eligibility. Chapter 3 will be more thoroughly discussed in a 
later section of this chapter, 

STATE ASSESSMENT: LOCAL EDUCATOR REACTION 

The state assessment the first year had been authorized and funded 
through the Department budget bill. This was an expedient method, but 
only a temporary one. The "Governor's Task Force on Educational Reform" 
had reported the need for a continuing measure of pupil achievement. The 
Department staff developed a draft bill which provided continuing 
authority for the program. Staff were concur^^^ntly working on revisions to 
the state compensatory education legislation to include educational 
deprivation in the criferia. It was natural that the draft bill to authorize 
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state assessment as a permanent program, would tie in compensatory 
programs. The draft bill with some minor changes was enacted as Act 38 
of the Public Acts of 1970, and remains as the legal base for the program 
today. The legislation broadly states: "A statewide program of assessment 
of educational progress and remedial 3issistance in the basic skills of 
sTuaMts In Teadif^^^ arts and/or other general- 

subject areas is established in the department of education , , , 

The provisions then go on to give various elements of the program. 
Included are: establish achievement goals, provide information useful for 
allocation of state funds to equalize education opportunity, provide 
incentives to introduce programs to improve basic skills or attainments, 
and provide the public information on the school system. With this 
legislation, the state assessment had a definite mandate. 

The assessment tests from 1970 were revised for use in the 1971 
assessment. The tests were lengthened so that each fourth and seventh 
grade pupil would take a full test battery for reading, mechanics of written 
English, and mathematics. The tests would now yield scores reliable 
enough for reporting individual pupil scores, as well as aggregate scores at 
the school building and district levels, and a state report. The tests were 
administered throughout the state in January, 1971, along with the still 
controversial socio-economic status and attitude scales. 

Before the test administration period was over, a group of local 
superintendents met to review this "new" state program of assessment. 
These discussions led to action by some thirty-^ight of them. They ordered 
that the test answer sheets be held in the district and not sent to the 
scoring service. The press picked up the story and the state assessment 
became a big story , , , the program had visibility! 

After two weeks of unsuccessful discussions where state officials tried 
to convince the superintendents to L-end in the answer sheets for scoring, 
the State Superintendent and the President of the State Board of Education 
sent a joint letter to local superintendents and board presidents, 14 xhe 
letter cited "Act 38" authority for the assessments, directed the submission 
of answer sheets, threatened court action and offered ..to discuss the 
superintendents' concerns. The superintendents, though reluctant to 
comply, cho not to challenge the state authority further. 

In the ensuing discussions, local superintendents raised several issues. 
The major issue was, of course, the intrusion of the state into local school 
. affairs. Each of the seven or eight meetings between Department staff 
and superintendents began with this issue and required a rejustification of 
state assessment and the state authority. Other charges were made, such 
as, (1) the tests were invalid and did not correspond to the "Michigan 
Curriculum," much less the "unique" curricula of the many local districts; 
(2) the tests were ill constructed and unreliable; (3) the test information 
was no different tfian. that already known from local testing; (4) local 
educators had no opportunity to participate in the planning of the state 
assessment program; and (5) reports of results would be made to the 
public. The criticisms were in part acknowledged by the staff and promises 
were made to be more responsive to the involvement and data needs of 
local educators. The technical issues, i,e,, reliability and validity of the 
tests, were defended and final disposition of the charges were left to the 
publication of the technical reports. 
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After nearly seven months of monthly meetings, the superintendents, 
though still not satisfied, decided further discussions, were unnecessary. 
They would cooperate in the future, and the Department would form an 
advisory council to help form the future of state assessment. 



PUBUC RELEASE OF RESULTS 

^ The promise had been made by Department officials in the first year 
of the program that the state would not release the "scores" of individual 
school districts. With this promise, local school officials felt secure with 
the new program and cooperated in the administration with a minimum of 
objection. Scores were reported directly to local school superintendents 
and to state officials. 

In the second year, the prfes, and eventually legislators, made inquiry 
about the "scores" and were told they could not be released. This led to a 
confrontation. An influential legislator threatened to' introduce legislation 
which would mandate the release of the state assessment data for schools 
and districts arid would provide guidelines for the release. After 
discussions with the legislator, the Department policy on release of these 
data was changed and the basic skills assessment data were provided to the 
press and legislators. Even today, ten years- later, the promise which 
couldn't be delivered, i,e,, no public release of school or district results, is 
remembered by some superintendents, 

, The tis^t release of results was made in response to individual 
requests. However, the interest was great and the State Superintendent 
decided to publish the results for all districts. The first to be published 
were the 1971 results, A cotftpiflation of dkta (assessment test results, 
staffing, financial, dropout, etc) was made, A book was put together and 
released about a year after the test administration; ironically, the book had 
a red cover and the press conference for its release was dh Valentine's Day, 
1972,15 Superintendents had no love of the assessment, and saw no 
humor in this. The 1972 results were published in like form, the book being 
brown and thp release was at Thanksgiving,! 6 "thanks" of local 

school people was that this was the last book of all district results 
published by the state. The "heat" was too much and the Superintendent 
decided results from 1973. were to be released on request, but no 
compilation of all districts was released. 

After 1973, rank ordering of school districts was not done. The tests 
were changed from norm referenced, which could be reported succinctly in 
standard scores and percentiles, to objective referenced, which were 
reported in proportion of pupils mastering each objective. Since there 
were over sixty objectives, the publication of even district level data for 
all districts was too burdensome. Assessment results were made available 
on request, and often were listed in the newspapers. 

The public release of results was made even more necessary in 1977 
when the State of Michigan enacted the Freedom of Information Act which 
required public agencies to make information available upon request. 

The policies on release of assessment scores have been influenced by 
educators, legislators, public advocacy groups, and assessment technicians. 
The "puU and tug" to derive a policy involved the desire for widespread 
public disclosure on the one hand, and the fear of misinterpretation and 
misuse on the other. School people feared invidious comparisons of 
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schools, and judgments of school effectiveness based on a narrow set of 
measures. Public advocacy groups pressured for full disclosure and 
specialized reporting for subpopulations, e.g., racial-ethnic groups. 
Assessment technicians counseled caution in generalizing from the data 
and sQiight ways to provide be tter interpreti ve rep or ts. Le gisla tors, at 
first, wanted full disclosure, but have more recently pressed for 
■recognition of the limitations of the^data. In fact, the Legislature inserted 
a statement into the Department budget bill which prohibits the use of 
assessment as an evaluation of schools. The current policy of disclosure is 
to make results for a school or district available upon request, but to 
provide explanatory and interpretive materials along with it. The State 
Board in 1979 sponsored several workshops for local educators and the 
press. The Board^s purpose was to assist local educators to work with the 
press to achieve full disclos^^^^th responsible reporting. Also, the State 
Board adopted a policy wmch stated that assessment results were not 
appropriate to use in the/€valuation of an individual teacher. 



/ EVALUATION OF SCHOOLS 

Tied closely toVthe public reporting issue are the issues of use of 
assessment results iiKmaking decisions. "How good is my school and 
district?" has always been the prime question of interest to citizens. 
Before state assessment, the judgments undoubtedly were made on criteria 
ranging from hearsay, to athletic teams, to the number of graduates 
getting scholarships, to any of numerous other factors. State assessment 
was of interest as a criterion for judging the worth of a school or district 
because it was reporting on how well pupils were learning. After all, 
schools existed to teach the basic skills and should be rated on how well 
this was accom^plished. Newspapers reported scores and pointed to high 
scoring districts (mean of pupil scores in the district was used, and later 
the proportion of pupils who mastered more than 75% of the objectives 
tested) as "good," and the low scoring as "poor." Real estate agents, too, 
tried to use the scores, if it suited their purpose, to steer customers to buy 
in "good" districts. 

The comparison of schools and districts on assessment scores alone 
concerned school administrators. They carried their dissatisfaction to key 
legislators as weU as State Board members. Under pressure from the 
legislators, the Department initiated a large campaign to assist local 
educators in the proper and full reporting of results. Advocated were 
early reporting, and reporting in the context of other information about 
education,- i.e., the financial, staffing, and other conditions of education. 
The idea was to put the assessment scores in a larger context to provide 
for a fuller understanding and a better "evaluation" of the schools than a 
simple judgment made on one set of test scores. 

MAXIMIZE POUCY USE OR INSTRUCTIONAL USE 

Initially (i.e., 1970-73) the state assessment program used norm 
referenced tests developed to Department specifications by a testing 
company using existing items. The primary purpose of the program was to 
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measure the status and progress of basic skills achievement in the state 
and its districts. These tests in reading, mathematics and mechanics of 
written English provided data for these purposes with a minimum of 
expense and testing time. An aggregate achievement "score" for the state 
in ea ch area was com puted^ as was a "sc ore" for each district in the state. 
Districts were easily ranked by percentiles, and districts in need of 
assistance were easily identified. State policy purposes were well served 
by the use of norm referenced tests. 

Politically, though, there was discontent with the program. The 
discontent involved: (1) the use of the results to compare school districts, 
(2) the tests did not provide information useful to schools in instruction, 
and (3) the tests were not "Michigem tests" and Michigem educators had not 
been involved in creating them. 

After the furor created by the superintendents in 1971, the State 
Superintendent decided to both be respbnsive to the issues raised about the, 
norm tests, and to exercise state leadership in basic skills curricula for the 
state. It was decided to chemge from norm referenced to objective 
referenced tests for the state assessments. The decision would switch the 
emphasis to maximize the instructional an6 curriculum uses of the results, 
at the local level, rather than the policy use at the state level. 

The State Superintendent met with each of the statewide curriculum 
organizations (i.e., mathematics, reading, science, social studies, health 
education, physical education, art, music) and challenged them to specify 
the basic expectations for their area. The basic expectations were, in 
general, defined as what every pupil should be able to do emd should know 
at the end of grade 3, grade 6, and grade 9. These were to be "minimal 
ex^ecations" for all pupils in Michigan schools and would be strongly 
advocated as the minimum curricula for all schools in the state. The 
curriculum organizations, after much discussion, all chose to respond euid 
work with Department curriculum specialists to specify the "minimal 
expectations." 

During late 1971 and early 1972 the curriculum specialists drafted the 
expectations. These were reviewed and, in some cases, revised by 
comnrtittees of generalists (i.e., teachers, principals, school board members, 
school administrators and parents). Finally, in 1972 the State Board 
adopted the first two sets of expecations or objectives. These were the 
reading euid mathematics objectives which were to be used in the new state 
assessments. 

The tests were constructed from the objectives. The Department 
engaged some local school districts to provide: (1) teachers to write test 
items, and (2) classrooms for test tryouts. The tests were to be written 
primarily by Michigan teachers based on Michigan produced objectives 
. . . these were to be "Michigan tests." The Department also contracted 
with a testing company for support services to insure that the new tests 
would meet high technical standards. 

The tests were completed and ready for use in the fall 1973. The test 
administration time was changed from January to September-October with 
the initiation of the objective referenced program. This was done because 
of the emphasis on instructional uses. The early administration allowed the 
return of results early so that individual pupil needs could be identified emd 
teachers would have time during the school year to provide remediation, if 
needed. The reports would identify the objectives mastered, and those not 
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mastered. Mastery was defined as answering correctly at least four of the 
five test items for each of the forty mathematics and twenty-three of the 
reading objectives. The reports contained detailed information compared 
to the general information contained in the norm test reports. The detail 
of mony scores made it more diffic ult to compare schools an d distric ts on 
the basis of state assessment, but made the information more valuable to 
principals and teachers. ' 

The State Board had used, the state assessment program to exercise 
leadership in Michigan education. For the first time a. common curriculum 
had been specified, albeit it was only a minimal level and was suggested 
rather than mandated. The minimum expectations, though, were to 
become useful in promoting equal educational opportunity initiatives in 
schools of the state. 

STATE LEVEL USES 

The change to objective referenced tests and the" more detailed 
reports was responsive to local education criticisms. However, when the 
first reports were released^ the press and state officials were confused by 
the many figures. They wanted to be able to tell whether or not schools 
were doing better than last year, and which were "good" and which were 
"poor" achieving schools. There was a demand for a Simple summary type 
report. The State Superintendent asked for a single score. 

The political pressure from State officials led to the development of a 
summary type report. The report was added in 1974 and was called the 
"proportions report." the report gave the percent of pupils mastering 
objectives in each of four categories (i.e., 0-24, 25-49, 50-74, 75-100 
percent). The reports were in reading and mathematics and were produced 
for schools, districts, and the state. The fewer figures were more 
understandable and useful to laypersons and for state purposes. 

The proportions reports were used to set criteria for identifying levels 
of needs in Michigan schools (e.g., schools with fewer than 50 percent of 
the pupils mastering 75 percent or more oi the objectives were defined as 
high needs schools). The State Superintendent and staff directed special 
assistance to these schools in an effort to assist them to improve. 

' The assessment program reflected an action policy of the Department 
of Education to seek and use information to develop a concerted program 
for educational improvement. However, it was too genei^al to provide 
information to assist in determining the success or effectiveness of 
specific program efforts. Thus, concurrent with the development of the 
assessment program, the evaluation program was also being developed. 



IKTTIATING PROGRAM EVALUATION 

As noted in an earlier section, the advent of extensive federal 
involvement in education, in particular, the Elementary and. Secondary 
Education Act of 1965 (ESEA), provided a new impetus for State Education 
Agencies across the nation. In addition to creating an expanded role for 
State Education Agencies in educational program development and 
administration, ESEA demanded a more active role for State Education 
Agencies in evaluating those programs. 
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ESEA caught most state and local education agencies unprepared to 
undertake sophisticated and technically sound program evaluation. Federal 
'officials were vague' in providing direction and frequently suggested 
summative questions which needed response. These questions were usually 
descriptive as well-as^suxamative-in nature. R.eyond-lhese-basic^escripLtivfi- 
questions, State Education Agencies were encouraged to develop evaluation 
capabilities and design evaluations to best meet the needs of state and 
local constituents. 

In Michigan^ as in most states, early evaluation efforts were aimed at 
meeting the sujnmative evaluation requirements. The prevailing philosophy 
was that evaluation was a federal reporting requirement which had to be 
done in order to maintain eligibility for funds. These "reporting" activities 
were decentralized in the. state agency as part of the overall responsibility 
of the persons who administered the programs. The evaluation results were 
seldom used (nor thought to be useful) in program administration or policy 
development. 

The decentralized approach and the "required reporting*' philosophy 
toward evaluation began to change in Michigan ifi 1969 with the creation of 
a Bureau of Research. With the establishment of a new Bureau came the 
direction for the new staff to begin conducting evaluation of the new 
federal programs, and to use. some of the federal money to support these 
evaluations. This new commitment was further strengthened by the 
appointment of a new superintendent who, as noted earlier, believed that 
information provided by technically sound evaluations would lead to 
improved decisions regarding educational programs. 

The early and active support by the State Superintendent resulted in a 
decision to begin recruitment and employment of a small number of 
specialized staff- and to begin centralizing the function of evaluation. 
Evaluation staff were to be administratively independent of the personnel 
who were responsible for . management of the programs to be evaluated. 
The new staff were asked to develop and implement a systematic. approach 
to prograra evaluation. 

In 1974, the State Superintendent emphasized his support of evaluation 
and put the full weight of' his of fice behind the centralization of the 
evaluation functions in the evaluation program^ A supportive policy 
statement was issued; portions of the statement follow. 

It is -my intent that all evaluation activities sponsored by the 
Department of Education be coordinated by staff with expertise 
in evaluation so as to maintain consistency in the evaluation 
efforts.^ 

After citing several negative aspects associated with decentralized 
evaluation efforts, the Superintendent's statement continued: 

If evaluation is worth doing, it is worth doing weU. Furthermore, 
program administrators should never evaluate their own 
programs. Therefore, effective immediately, I am asking each 
of you to ensure that the ^evaluations of your programs are 
coordinated through (the Evaluation Program) which is 
. . . responsible for evaluation. 1^ 
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The statement concluded by indicating actions which should be taken 
to receive approval for evaluation activities. 

In 1977, the State Superintendent repeated his statement verbatim and 
added that he expected any items which included plans for evaluation, such 
fls progr ammatic state plans, to include a statement of support from the 
Evaluation staff before being submitted to the State Board of Education for 
approval. 19 

THE EVALUATION PROCESS 

The goal of evaluation was to provide information to educational 
decision-makers so program improvements could be made. Staff were 
committed to the task of demonstrating that well-designed, carefully 
implemented and properly , supported evaluation "provides objective 
information for planning, administering and improving educational services 
at all levels of educational governance, from federal and state to school 
district, to school building and to classroom levels."20 

In an enterprise so large and encompassing as education with many 
factors beyond the control of the evaluation specialist, it is impossible to 
employ the same experimental rigor which might be found in a scientific 
laboratory. In the social sciences and education it is often impossible to 
control conditions and set up experimental designs as in the natural 
sciences. Therefore, the evaluation model employed by the Department 
had three stages: (1) descriptive evaluation, (2) evaluation to determine 
success, and (3) evaluation to determine effectiveness. Descriptive 
evaluation refers principally to the quantitative description of resources 
(human, financial and material)- and purposes associated with educational 
services. Evaluation of success refers to quantative and qualitative 
judgments regarding whether or not objectives of an educational delivery 
system have been met. Evaluation of effectiveness refers primarily to 
identification of factors associated with success and the relative costs of 
assuring that those factors exist. 

While these stages are sequential in nature, they are also fluid and 
overlapping. For example, it will take some evaluations a year or more to 
pass through the descriptive evaluation stage while others will pass through 
this stage much more quickly. Also, work may be occurring in more than 
one stage^ simultaneously; for example, evaluation of success may begin 
before the descriptive evaluation is complete. Furthermore, the 
implementation of each successive stage does not mean that the prior 
stages are terminated. Rather, each successive stage builds upon the 
information provided by the preceding stages. 

In the last half of the 1970s, several evaluations were able to identify 
factors which are related to success. Based, in large part, on these 
evaluations the Department rj now exploring means by which, through a 
state-local partnership, strategies can be developed which will lead to 
more predictable program improvement. 

As part of the State Superintendent's policy statement,^^> he 
asked program administrators to enter into "Agreements for Services" with 
the evaluation staff for the conduct of evaluations. The "Agreements for 
Services" reached between program administration staff and evaluation 
staff specify services and responsibilities of both staff and formalize 
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expectations of both parties. The agreement commits the administrative 
unit to provide a mutually agreed amount of funds to conduct the 
evaluation. .The agreement commits the evaluation unit to provide 
information to answer specified program and policy questions. Staff 
^mpioyed Tor the evalua^^ prbgrammatically 
independent of the program administration staff. 

While administrative independence is desirable, daily substantive.^ 
interaction among the evaluation and program administration staff is 
essential. Evaluators must be aware of, and sensitive to, the subtleties of 
the. program they are evaluating. Also, infoimal substantive contact 
decreases the threat often associated with evaluation. 

Much emphasis is placed on communi cation among staff and the 
appropriate, use of evaluation results. In addition to the da;y- to-day contact 
among staff, more formal mechanisms are used for presenting findings and 
recommendations. Formal "exit confei'ences" are attended by evaluation 
staff, program administration staff, and, frequently, one or more high-level 
officials of the Department. At these "exit conferences," findings of the 
evaluation and action-oriented recommendations are formally presented by 
evaluation staff. Program administration staff respond, either at the "exit 
conference," or soon after, regau^ing actions they pla n to take on each 
recommendation. The "exit conference" reduces the likelihood that 
V/aluation findings and recommendations will be ignored. The 
aaministrator in charge of both the program and evaluation responsibilities 
is present, supportive, and can direct actions and policy responsive to the 
evaiii^tion findings. 

The State Board of Education has, historically, been very interested in 
the wo^k of the evaluation staff. Great care is taken to prepare and 
present evaluation reports to the State Board which will be meaningful as. a 
tool to gui,de in establishing policy. Frequently, major segments of time 
are set aside by the State Board of Education at Committee of the Whole 
meetings to cj(iscuss evaluation reports, recommendations and implications 
for administrative and State Board action. 

In addition\ to formal and informal communication efforts with 
Department staffs and the State Board of Education, evaluation staff are 
actively involved/in a program of technical assistance and dissemination to 
local education^ agencies. These activities cross a broad range from 
distribution of executive summaries of evaluation reports *to formal 
inservice or/technical training sessions. 

• \ "* 

THE CONFUCT BETWEEN EXPECTATIONS AND METHODS 

There has been a strong policy and programmatic commitment to 
evaluation in the Michigan Department. An environment has beerj created 
to' promote the development of a strong organization for conducting 
evaluation. The commitment to use evaluation findings and act on 
evaluation recommendations has Been high on the part of the staff of the 
Department and the State Board of Education. But even in an organization 
with this high level of commitment conflicts between expectations and 
methods can and do occur. ' ^ 

One of the most common areas of Conflict is a conflict among program 
priorities. These conflicts are primarily 6f two types. One type of conflict 
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is a result of a lack of clear enunication of the purposes and priorities of a 
given program. Evaluation staff, during the "descriptive stage" of 
evaluation, work with program administrative staff to clarify the purposes 
and objectives of the program to be evaluated. This process can be a 
par ticularly frustrating one , if there are external pr essures to provide 
information quickly. The lack of clear enunciation of purposes and 
priorities becomes acute when the evaluation effort begins a lengthy period 
of time after the program begins. 

The examination of program purposes and priorities frequently leads to 
a second type of conflict among program priorities. This occurs when the 
programs have a mixture of social action and education priorities. For 
many reasons, categorically funded programs often fiave^a multiplicity of 
apparent purposes; some i^tablish primarily education priorities while 
others establish primarily social action priorities. For example, legislation 
may contain language which seems to equate civil rights and basic skills 
education, 

It is not .uncommon for these social action and education priorities to 
be so closely intertwined that it becomes virtually impossible to distinguish 
among them. The "descriptive" stage of evaluation is used to deal with this 
problem '(as a part of enunciation of purposes and priorities). However, 
even—if -the_social action and education priorities can be. identified and 
separated, these programs are especially difficult to evaluate. In some 
cases, program administrative staff have preconceived expectations^ 
regarding the outcomes of an evaluation. Additionally, they often do not' 
fully understand that social action objectives cannot be measured by 
educational performance measures. This combination of preconceptions 
and misunderstandings can lead to great disappointment upon the 
completion of ;the evaluation. 

Evaluation staff must be especially careful to develop mutual 
understanding about purposes, priorities and expectations of the program to 
be evaluated. Of equal importance is the development of ^mutual 
understanding regarding expectations for the evaluation effort. Evaluation 
staff of the Michigan Department of Education use the "descriptive" stage 
of evaluation to develop these understandings. However; the affirmation 
of these understandings must be continuous, 

A second area of common conflict between expectations and methods 
is the conflict in requirements. This type of conflict is the conflict 
between program funding mechanisms and expected results, and most often 
occurs when programs are funded on one set of criteria and the program 
success is judged on a different set of criteria. For example, the funds are 
provided for reimbursement of program staff salaries, but the evaluation 
focus is on how much the participants achieve. This particular type of 
conflict may create hostility among local education agency staff who feel 
that it is unfair to conduct a state-level evaluation of those parts of the 
program funded locally. 

This type of conflict often establishes a negative political environment 
within which it is very difficult to conduct an evaluation. Evaluations 
fraught with this type of conflict usually wUl not advance past the 
"evaluation of success stage," The evaluatcrs can use the earlier stages of 
the evaluation to. establish reasonable ^measurement criteria and data 
collection procedures. However, there are likely to be so many negative 
factors beyond the control of the evaluators, that some such evaluations 
will never leave the "descriptive evaluation" stage. 
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Another area of "conflict occurs between federal constr-aints and 
state-local policy and program needs. Historically, the requirements of 
federal programs have focused on summary reporting and are of little use 
in state or local program or instructional decision making. This is not a 
pToblem so "long- as state and local education agencies are able, to exceed 
these requirements. In fact, if the burden of federal reporting is minimal 
and federal funds can legitimately be used to expand the evaluation of that^ 
program to yield results meaningful to state and local educators, a positive; 
state of affairs exists. The conflict occurs when federal requirements, 
even though summative in nature, are so burdensome tha-t all of the 
resources available must be used in meeting the federal requirements. A 
review of the history of evaluation of ESEA Title I in Michigan suggests 
that thi s pattern has^^cc urred. ES EA Tit le I w ill be discussed in more 
detail irra"later section of tfiis paperT — ^ - 

Recently, ^ second type of conflict between federal program 
requirementji^and state-local policy and program needs for evaluation has 
become common. Federal programs are becoming more and more 
prescriptive with regard to mandating specific information which must be. 
gathered and the specific evaluation procedures which must be used by 
state and local evaluation staffs. These procedures are frequently 
insensitive to state a nd local policy and program needs. Further, the 
constraints cure such that states can do little to design evfiduations to meet 
both federal requirements and state and local needs. An example of this 
"federal prescription service" is the rules and regulations dealing with data 
collection and evaluation of programs funded unxier PL 94-482 (Vocational 
Education Amendments of 1976). 

PL 94-482 and its associated rules ^ and "regulations require both^ 
evaluation and reporting of management data (the Vocational Education' 
Data System-VEDS) on every vocational program. The evaluation 
requirements, by themselves, are manageable and considered by many to be 
'useful. However, the reporting requirements of VEDS are so prescriptive 
and burdensome that the Councfl of Chief State School Officers (CCSSO) 
has officially opposed them and threatened to refuse to comply. This 
enormous data collection burden imposed by the federal governmennt has 
made it very difficult for evaluators to collect and analyze new 'data 
needed for more meaningful evaluations. 

A third general area of conflict is a conflict in coromitment. 
Frequently, top level policy makers do not provide adequate support for 
evaluation activities because they have an incorrect impression of what 
evaluators do. This is especially true if the only visible product of the 
evaluation effort is an annual summary report which has little perceived 
usefulness. Evaluators need to do a much better job of helping policy 
makers understand what evaluators do. Formal and informal 
communications must not stop with program administrators. 

Another problem is that commitment from top level policy makers, as 
evidenced by resource availability, is inconsistent. Ironically, in periods of 
economic difficulty, resources for evaluation rrfay actually increase as 
decision makers seek data to help with the management of decline. In 
economically good times, .evaluation may not seem as necessary and 
resources for evaluation may become less plentiful. This inconsistency, 
even in a state with a generally high level of commitment, makes 
long-range planning for evaluation somewhat more difficult than desirable. 
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. A CASE OF THE INFLUENCE OF EVALUATION ON POUCY 

Michigan has had a state compensatory education program since 1967. 
In the first years of the program, funds were distributed to school districts 
as formula grants/ The fo^ used ecpnomic, cultural and social factors 
for defining educational need, Hovyever, begiruiing in 1970 Mi began 
to define educational need in terms of pupil achievement, i.e., a direct 
measure of educational need rather than a .proxy me^sure^ .The state 
compensatory education program ^makes use of the state educational 
assessment results for this purpose. The procedure for determining school 
district eligibility, beginning in 1971, f c5r the compensatory funds, was: 

1. Pupils scoring below the 15tH percentile on the^4th and 7th 
^"^^^^^ade^sWtena^ defined as pupils to be 

countejd as "eligible." , 

2. The proportion gf all 4th grade pupils deemed "eligible" was 
computed; likewise, the proportion of th^y^th grade pupils. 

3. Applying the proportion ot- eligibles ih ^ades 4 to grades 
- K-3, an^ estimate of "eligibles" in^ -those grades was 

~ (lomputed;^^^^^^^ the grade 7 - proportion was used to 

estimate grade 5-6 "eligibles." 

*^ , ' • 

. ' 4. The school districts were ranked (high first) according to the 
proportion of "eligibles" in the district. 

The district allocation was computed By multiplying the 
number of "eligibles" in grades K-6 '(the State program was 
limijted to the§e grades) times $200 (the funding level). 
Districts were funded in rank ordei* until the total State 
appropriation was used. 

It was determined that the state could afford $22.5 million for the 
program before the formula for the program was written into legislation. 
Legislators used several computer simulations, each with different 
eligibility aod/or .fiind level criteria, in -the process of setting criteria. 
Basically,' .the data were used in making political-policy decisions. 
Legislators^wanted to know which districts would be funded, or not funded, 
and at what level, before agreeing on the formula. The final formula was a 
corripromtse- made by members of 'the appropriations and education 
committees of the legislature. ' ' 

The use. of an achievement test indicator for cetermining the level of 
educational need was but one of several different feiitures of the Michigan 
program. Others were: 

'^'^ 1. assurance of three years of funding once a 'district was 
V ' deemed eligible 



2. provision for funding adjustments based on program success 
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3, provision for annual evaluation of each pupil's progress to 
determine level of attainment 

4. provision of considerable local discretion vested in local 
distric^ in the use of funds I 

Department staff worked together with local educators to design the 
program. The local educators were interested in three things in the new 
compensatory education program: (1) more money, (2) more discretion in 
the use of the money, and (3) greater assurance that the money would be 
available for more than one yoar. Each of these was attained in the new 
legislation (Rumbaugh and Donovaji, 1976), \ 

The State, in this program, was interested in two important' 
propositiohs: (1) could schools be held accountable for educating the 
lowest achieving pupils in the schools, and (2) could additional money for 
basic skills instruction result in higher pupil achievement? 

The program has changed over the years since 1971, When the State 
assessment changed to objective referenced tests in 1973, the criterion for 
eligibility was changed from students below the 15th percentile to students 
achieving fewer than 40% of the objectives tested. Again the legislature 
used several simulations of data to set the formUi.a so that the funded 
districts and the funding level remained comparable to the previous years. 

The three years funding feature was later changed to an .annual 
redetermination of district eligibility and funding level, Non^funded 
districts lobbied for the change so they would have a chance for funds 
before., the three-year tycle was completed. 

It was believed that all pupils, regardless of race, geographical 
location, economic status, etc, could attain basic reading and mathematics 
skills. Thus, a feature? to reduce funding if pupils did not achieve was 
included. The adjustments were to be made annually on the basis of pupil 
achievement. Pupils who achieved at least 7,5 months' gain, as measured 
in grade equivalent units on a standardized test, received a full $200 
allocation for the next year. However, if achievement was lower than 7,5 
months, the district received a lesser amount (the proportion being the gain 
in months to 7,5 times $200), Local districts accepted this feature initially 
' to get the money. After the first year they lobbied to retain the money 
which was going to be "lost" because "the kids still need the help," The 
money was reallocated to the district provided they filed a plan to meet 
the needs of the, students who were still low achievers. After two years, 
and threats qf losing more money; the districts succeeded in getting the 
legislature to i^elete this "accountability" feature from the program, . 

The evaliiption of the compensatory education program was linked 
very closely with the "accountability" feature. Since the funding was 
determined on a per pupil basis and that was tied to level of attainment, it 
was necessary to evaluate on an individual pupil basis. State guidelines 
called for a pre and post test (either spring to spring, or fall to spring 
administration) uang approved standardized tests. Scores for each pupil 
were submitted to the State and .were used both for program evaluation and 
for the determination of funding. The verification and processing of over 
112,000 pupil records was quite a challenge for State evaluators. 
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The evaluations of the program showed the program to be a success. 
The districts committed themselves to developing quality basic skills 
programs based on specific performance objective^. Strategies were 
developed to provide services to low achieving children regardless of the 
school attended, thus moving away from the "target" school concept. Most 
important of all, the program resulted in improved achievement for pupils 
in the program. 

The State evaluators not ohly analyzed data from the school districts 
to determine program success, made recommendations for program 
improvement, and provided funding allocations, they also used the program 
to improve evaluation techniques across the State, The evaluation of 
individual achievement, as well as program evaluation, presented many 
local school educators a challenge beyond their knowledge and skill level. 
State staff were able to seize this opportunity to provide inservice training 
to improve evaluation methodology and data use in many school districts. 
Particular emphasis was placed on working with local district staffs to 
develop objective referenced tests for evaluation purposes. 

Unfortunately, the elimination of the funding adjustments based on the 
success feature made some people believe there was little need to continue 
program evaluation. Thus, funding to maintain the State evaluation staff 
was deleted from the Department budget, even though for another three 
years the mandate to provide the legislature with an evaluation report 
remained in the act. Local schools, in most cases, continued the program 
evaluation and used the results locally; however. State activities stopped 
with the withdrawal of funding. 

At the time evaluation funds were deleted, it was suggested that State 
assessment results be used to evaluate the program. The belief was, since 
State assessment was used to determine eligibility for compensatory 
education funds, the same test, over the same objectives, should be used 
for evaluation. Very simply, it was thought the fourth grade results would 
show success through the first four years, and seventh the last three years 
of the program. There ^was a certain logic to the proposisd; however, there 
were many falacies: (l") State assessment results were reported for the 
total pupil population, not compensatory pupils as a subpopulation; (2) 
pupQs moved in and out of compensatory programs and treatments varied; 
(3) one measure was not sufficient to evaluate program success and would 
tell nothing about why sohie programs succeeded more than others. 

The proposal was hdtly debated in the Department, and an attempt 
was made to implement this /"evaluation" by identifying individual pupa 
assessment data with the compensatory services the pupil received. Local 
school administrators and evaluators strongly opposed such "evaluation," 

The local staffs refused to cooperate in the coding and the attempt 
failed. After negotiations \it i was decided to use state assessment as a 
vehicle for collecting some\data about the compensatory program. Local 
evaluators agreed to code pup'ils on the fall assessment as enrolled in the 
various compensatory prograUs, i,e/, ESEA Title I, State, BilinguaL This 
allowed the State to address \ (some /questions with policy implications: (1) 
were the lowest achievers in^^comperisatory programs? and (2) were pupils 
enrolled in more than one cbmpensatory program? In case the lowest 
achievers were not in the pmgram, more stringent guidelines for pupil 
selection could be imposed, Data for the second question would be used to 
address whether or not thej greater benefit was to continue multiple 
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funding or spread money to new pupils. The coding project is now in its 
first year and data are being analyzed. As for the more indepth questions 
of evaluation, local districts and the Department are cooperating in special 
case studies. These studies will address the reasons some programs are 
successful and others are not. 

The State Compensatory Program is a good example of using data to 
direct policy based jcjn a philosophical belief, i.e,, low achievers should be 
benefited by fiends, lAlso, it is a good example of evaluation being used in 
the management of a program and its improvement. Unfortunately, the 
evaluation efforts were not appreciated and resources were withdrawn 
prior to the potential benefits being attained. The program continues, but 
there is no way of systematically judging its effectiveness. 



TITLE I EVALUATION - A CASE OF CATEGORICAL CONSTRAINTS 

The preceding discussion of Article 3 evaluation has presented a case 
study of evaluation's impact on policy. The evaluation of ESEA Title I is a 
case study of the categorical constraints on the usefulness of evaluation in 
policy development. 

Title I evaluation development paralleled, in many respects, the 
evaluation of Article 3, In fact, this federal program, more than any other 
single program, provided impetus for evaluation in the Michigan 
Department of Education, 

In the late 1960's and early 1970's, the federal reporting requirements 
for ESEA Title I were quite minimaL Summative information of a 
descriptive nature was expected and achievement data were desired by 
federal officials. However, it was recognized that many state and local 
education agencies did not have the capability to conduct more 
sophisticated evaluations. Federal officials encouraged state and local 
officials to develop capability to do evaluations that exceeded the minimal 
expectations, 

Thus, during the late and early 1970*s the evaluations of Title I 
conducted by the Michigan Department of Education were almost entirely 
descriptive, consisting intially of baseline information such as number of 
students, number of teachers, and amount of money spent. Beginning in 
1972, the evaluation started to yield accurate and useful information 
regarding success (in terms of achievement of students) of the program 
across the State. This information was based on district level information. 
Through the 1974-75 school year the design remained relatively constant so 
as to verify findings of success, 2^ 

State and federal officials were able to say with considerable 
confidence that Title I in Michigan was successful. However, the "success 
evaluation" did not provide sufficient information to enable state and 16<^1 
officials to identify or select specific strategies associated with success^.. 
These might be used for improving local projects which were not 
successful. Consequently, since 1975-76, the evaluation of ESEA Title I in 
Michigan has focused on the building as the level for data collection and 
an a lysis^ 

In the evaluation of Title I, the Department of Education evaluation 
staff have been successful in identifying a number of variables related to 
success. Further, these variables have been verified by other studies. 
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Thus, evaluation of ESEA Title I in Michigan has advanced into the 
"evaluation of effectiveness" stage discussed earlier in this paper. 

In addition to the progress in increasing the sophistication of the state 
level evaluation, evaluation staff have worked with local education 
agencies to make local evaluation efforts more useful. For many of the 
same reasons, the assessment program switched to objective referenced 
tests, loced districts were encouraged to develop objectives and objective 
referenced tests for evaluation purposes. Together, state and local 
officials worked to make the objective referenced tests useful both for 
instruction and evaluation. In order to assure high standards of quality for 
locally developed objective referenced tests. Department staff produced a 
quality control system for objective referenced tests which was generally 
followed by local districts (Schooley, et al., 1977). Locally developed tests 
had to meet the standards of this quality control system before they were 
approved by the Department for use in evaluation of ESEA, Title I. 

The development and planned use, of objective referenced tests for 
Title I evaluation peaked in 1974-75 and 1975-76 at about the same time 
that the state level evaluation was reaching the "evaluation of 
effectiveness" stage. Thus, the Michigan Department of Education was 
able to take advantage of the impetus provided by Title I to develop sound 
evaluation procedures which yielded meaningful results for policy 
development and program improvement at both the state and local levels. 
Additionally, the data provided to the U. S. Office of Education about Title 
I in Michigan were of high quality. However, not all states had developed a 
high degree of sophistication and those which had done so had used 
different methods and procedures. In short, the federal Rolicy of 
encouraging development of evaluations which were useful at state and 
local levels had resulted in data at the national level which were not 
comparable and of varying degrees of quality. 

In testimony during the debate leading to reauthorization of ESEA 
Title I in 1974, Congress expressed considerable dissatisfaction with the 
lack of comparable data to guide its deliberations. This dissatisfaction was 
specifically exhibited in the Education Amendments of 1974 and the 
Education Amendments of 1978. 

The legislation required the U. S. Office of Education to develop and 
provide to state education agencies "models for evaluation of all 
programs"24 funded under Title I ". . . to be utilized by local educational 
agencies, as well as by the state agency in the evaluation of such 
programs. "25 The law further stipulated that ; the models should yield 
data which are comparable on a state and national basis. 26 

At the time that U. S. Office of Education began initial development 
of these evaluation models, Michigan Department of Education staff were 
struggling to develop procedures for aggregating data from locally 
developed objective referenced .tests. It was hoped that the new models 
would recognize the value of objective referenced tests and that a sound 
model for their use would be developed. It was soon learned, however, that 
such was not to be the case. 

It became obvious that the models being developed would be of limited 
usefulness to the Michigan Department of Education. Consequently, while 
supporting the need for nationally comparable data, the evaluation staff of 
the Department actively advocated the development of more flexible 
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models aimed at identifying variables associated with achievement and 
greater utility at the state and local levels (Donovan and Schooley, 1977). 

In October, 1979, the final rules and regulations27 were passed 
mandating the use of three evaluation models. The models are much more 
restrictive and less useful at the state and local levels than hoped for by 
evaluation staff of the Michigan Department of Education. 

The immediate effect has been a retrenchment. Resources are not 
being spent to develop procedures to comply with the mandated federal 
reporting requirements. "Evaluation of effectiveness" has been sidetracked 
and extensive new development of objective referenced tests has come to a 
virtual halt in the evaluation of Title I in Michigan. It remains to be seen 
whether new state and local uses of the mandated evaluation models can be 
developed, thereby reducing the conflict between federal constraints and 
state-local policy and program needs. 



' SUMMARY 

The role of the State Board of Education in Michigan changed 
dramatically in the last ten years. The transition was from a rather 
passive presence in Michigan education to an assertive leadership role. The 
leverage for the change was in large part because better data about the 
educational system became available to them. 

The statewide educational assessment program was initiated in 1970. 
The data from the assessments were used as indicators of the weaknesses 
and strengths in basic skills education, and to monitor progress of the 
schools and districts of the state. The State Board based policy initiatives 
in compensatory education, accountability, equal education opportunities, 
and Department services to districts on information produced by the state 
assessments. The program became the "center piece" of elementary and 
secondary education in the state. 

The assessment data were good indicators of needs, but were of very 
limited use in providing direction in dealing with the needs. A more 
indepth evaluation of programs was needed to identify "what works" to 
produce a better educational system and higher achievement for children, 
youth and adults. The State ' Superintendent, in recognition of this, 
centralized the evaluation function in the Department of Education, and 
over the years was most supportive of their work. The evaluations went 
through three phases: descriptive, success and effectiveness. Especially in 
compensatory education, the evaluation data were, important in decisions 
of resource allocations, program management, and policy development. 
Whereas, the assessment data provided an indicator of problems, the 
evaluation data provided the data for addressing the problems. 

^ Michigan education has come to appreciate tjie power of data in 
decision making. The State Board and State Superintendent appreciate the 
power of data and use it in forming policy. They have been better able to 
justify policy initiatives, and have been more assertive in taking initiatives 
to change and improve the educational system. Information from 
evaluations of the Michigan educational system has been used to form 
policies, and in turn, the policies have influenced the direction, .of the 
evaluation. 
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The proactive State Board and State Superintendent in Michigan used 
evaluation activities and data to establish a state presence in education 
during the. 1970's. The tenor of the times was an "outcomes" orientation 
and a promotion of equity and equality for all children in education. The 
state accepted a responsibility for setting standards, for measuring impact, 
and for assisting schools toward improvement. This was a middle road 
between the policies in other states of setting statewide graduation 
standards based on compentency tests, and leaving standard setting 
completely to local initiative. 
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FOOTNOTES 

^Michigan, Constitution , (1908), Article XI, Section 2. ^ 
^Ibid , Section 6. 

^Michigan, Constitution , (1963), Article Vin, Section 3. 

^In Michigan, The University of Michigan historically did the 
accreditation of secondary schools on a voluntary basis. The Department 
of Education had neither the resources nor the inclination to take on this 
function. 

^Minutes of the State Board of Education, January 15, 1969^ 
Department of Education, Lansing, Michigan, p. 171-172. 

^Minutes of the State Board of Education, February 26, 1969, 
Department of Education, Lansing, Michigan, p. 223. 

'^Minutes of the State Board of Education, April 23, 1969, 
Department of Education, Lansing, Michigan, p. 306. 

^Act 100 of the Public Acts of. 1969. 

I ^Michigan Senate, Journal of the Senate : No. 7, Regular Session of 
1970, p. 83. : 

10"Michigan Assessment K-12 District Correlation," (Lansing, 
Michigan: unpublished Michigan Department of Education staff paper, 
1972). 

lllbid. 
12jbid. 

l^Act 38 of the Public Acts of 1970. 

l^Joint letter from John W. Porter, State Superintendent and Edwin 
Nowak, President of State Board of Education to Local Superintendents and 
Board Presidents, March 4, 1971. 

^^ Local District^ Results, ^L^^^^g^, Educational Assessment 
Program: The Fourth Report of the 1970-71 Series, Michigan^ Department 
of Education, Lansing, Michigan, December 1971, p. 171. 

^^ Local District Results; The Fourth Report of the 1971-72 
Michigan Educational Assessment Program , Michigan Department oT 
Education, September, 1972, p. 142. 

I'^Michigan Department of Education Memorandum, State 
Superintendent, Associate Superintendents and Service Area Directors, 
August, 1974. 
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ISlbicL 

^^Michigan Depjartment, of Education - Memorandum, State 
Superintendent to Administrative Council, April, 1977, 

^Ogtanley A, Rumbaugh, "A Review of the Status and Needs of 
Evaluation Activities in the Michigan Department of Education," (Lansing, 
Michigan: UnpMblished Staff Paper, 1977), p. 1. 

21iviichigan Department of Education Memorandum, State 
Superintendent of Associate Superintendents and Service Area Directors, 
August, 1974. 

^^Michigan Department of Education memorandum. State 
Superintendent of Administrative Council, April, 1977. 

23See footnote #20, p. 4. 

24public Law 95-561, Section 183(d). 

25jbid. 

, 26pu5iic Law 95-561, Section 183(f). 

2745 CFR parts 116 and 116a, 44 FR 59152-59159, October 12, 1979. 
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CHAPTERS 

The Washington Experience 

Alfred Rasp, Jr, 

BACKGROUND 

On January 11, 1973, a program evaluation section was officially 
established in the office of the Washington State Superintendent of Public 
Instruction; and for the first time, at least in modern history, an emphasis 
was placed on the measurement of program impact. This does not mean 
that previous superintendents lacked interest in the success of programs, 
but it did express a new concern for generating evaluative data as a basis 
for policy making. This paper will attempt to describe both the 
organization changes that have takien place and the interface between 
evaluation and decision making in Washington state. 

• To make sure there are no misunderstandings about intent, nam/es or 
geography, three ground rules will be established. First, this description of 
events will be neither an expose' of agency practices nor a positive 
self -^serving statement lauding the efforts of the evaluation section. In the 
words of Howard Cosell, the goal is to "tell it like it is." Second, in 
addition to the use of the standard educational acronyms such as LEA 
(Local Education Agency), SEA (State Education Agency), and USOE (U.S. 
Office of Education), the Washington State Superintendent of Public 
Instruction will simply be abbreviated to SPI in the name of economy. 
Third, to avoid misunderstanding whenever the word Washington appears 
singularly, it will mean "State of" not "D.C*^ People in Washington just 
prefer it that way. 

Dr. Frank B. Brouillet was elected SPI in the fall ox 1972 and officially 
launched his administration January 11, 1973. His professional career 
represents an interesting blend of education and politics. He has 
professional experience from both the school and college levels. He is a 
former teacher, counselor, coach and administrator. He has degree n 
economics and education and an earned doctorate. Perhaps most unique in 
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this background blend— he served 16 consecutive years in the Washington 
House of Representatives and consistently provided legislative leadership 
in educational affairs. 

This combination of experiences has led Superintendent Brouillet to a 
three-part educational philosophy. He professes a firm belief in the 
importance a local control, a commitment to providing the resources 
necessary for a quality education, and a dedication to the basic tenents of 
educational accountability. 

It is the third element that is of special significance to this discussion 
of evaluation and decision making. Being an insider to the working of the 
legislature, Brouillet knew long before being sworn in as SPI that perhaps 
the only way to expand the amount of state resources for education, and at 
the same time protect and strengthen lo<?al control, required close 
attention to accountability. He knew that maintaining or increasing the 
financial support for programs-, in existence and initiating new programs 
depended in large part on providing the legislature assurance of the 
following: first, that the program is necessary— that the need really exists; 
and second, that the impact of the program can be measured, and, of 
course, that the results are positive. The methodology of evaluation plays 
a central role. The key question, however, is not whether one alternative 
or treatment is more efficient or effective than another, but in a more 
basic sense, does the alternative selected make a difference? Is there an 
impact? Assuring need and effectiveness become prime concerns for 
evaluation in the political accountability system. The influence of an 
elected superintendent with an educational and legislative perspective 
clearly makes an impact on evaluation practices. 



ESTABUSHING AN EVALUATION SECTION 

The rhetoric of the campaign trail— increasing programs, and 
protecting local control by establishing need and measuring 
impact— became criteria for establishing a new section within the office of 
the Superintendent of Public Instruction. This program evaluation section 
gave concrete, visible proof that the new superintendent meant what he 
said— there was clearly a place to point to on the organizational chart. (An 
organizational chart appears as Figure 1 at the end of this chapter.) 

It should be noted that in many respects the development of the 
evaluation section was a process of putting "new wine in old bottles." 
Because at the same time the superintendent took office, the legislature in 
the name of efficiency placed a limit on the staff size of state agencies, 
and no trained evaluators were hired. The evaluation section was formed 
with personnel on hand, only the responsibilities were new. 

As time passed, the deputy superintendent was instrumental in shaping 
the section into an effective work unit and expanding the emphasis on 
program evaluation. He recognized early that in the name of objectivity 
program managers should not evaluate their own programs and that outside 
contractors could not interact favorably with the legislature. With his 
leadership, several programs were designated as priorities and small 
amounts of their administrative funds were* used to establish project 
employment positions in the evaluation section and to hire staff to carry on 
the evaluation activities. This move gained both objectivity and 
credibility, as well as the required evaluation data. 
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The title of the section has changed during the years to reflect new 
emphases and to better reassure the legislature and the public that the 
^accountability charge is being carried out. In the beginning, the name 
Program Evaluation seemed to be the answer. It soon became apparent 
that not having a "research" descriptor appear in the organizational roster 
was causing the agency to miss important contacts. Thus the section title 
w^ expanded to Program Evaluation and Research, 

\ By the mid-seventies, however, evaluation in the SEA setting had 
generally subsumed research activities and, with the advent of "golden 
fleece" awards and other indicators of the public^s low esteem for 
educational research, the title was changed to Testing and Evaluation, 
This choice dropped "research" and added emphasis to the more popular 
notion of "testing," At a time when legislative debate on questions of 
testing was long and loud, the reference to testing in the section title 
reflected the SPI's intent to meet issues head-on, 

A broadening of the accountability concept took place in the 
mid-seventies when the Seattle School District successfully sued the state 
for not meeting the constitutionally mandated duties "to make ample 
provision for the education of all children , , , and provide for a general and 
uniform system of public schools," This legal battle led directly to the 
passing of the Basic Education Act (BEA) in 1977 and to the need for 
"evaluating" LEA compliance with the provisions of the law. At the same 
time, the State Board of Education renewed its interest in expanding the 
concept of school accreditation to focus on faculty self-evaluation, that is, 
on improvement through evaluation. How better could the SPI meet these 
accountability challenges than by adding the responsibilities to the Testing 
and Evaluation section and changing the title to Testing, Evaluation and 
Accountability? The routine compliance checking activity of the Basic 
Education Act is to move out of the evaluation sphere to a more 
appropriate long-range setting. Perhaps the section title will then stabilize 
as Testing and Evaluation, 

This discussion of names may sound superficial, but it is important to 
-note that the activities of the section have always received the necessary 
financial support. Hopefully, the major reason for this fortunate 
circumstance is that the section staff has discharged its assigned 
responsibilities with professional competence. The changing of titles, 
however, does reflect an attempt to match the "mood of the times," 
certainly that of the legislature, and to build confidence in the SPFs intent 
to establish program need and to measure impact as a decision making base. 



CURRENT RESPONSIBTLmES IN TESTING, EVALUATION 
AND ACCOUNTABILITY 

Testing 

Currently, the major testing responsibility is to carry out the 
manda^tes of the state testing law passed in 1976, This law, titled Student 
Achievement Surveys and Tests, requires state testing and reporting at 
three grade levels. The SPI must annually administer a standardized 
achievement test in the basic skills of reading, language arts and 
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mathematics to all fourth grade students. The results of the testing, along 
with the relationship of achievement to appropriate input variables, are to 
be reported to the legislature, LEAs, and subsequently to the parents of 
children tested so that parents can compare the achievement levels of 
their children with others in the district, state and nation. In grades eight 
and eleven, samples of students sufficiently large for generalizing to the 
entire state, approximately 2,000 at each grade level, must be tested in 
reading, language arts and mathematics and the results • reported to the 
legislature at least once every four years. The law also encourages local 
school districts to conduct diagnostic testing in grade two but does not 
assign that responsibilty to the SPI, 

The main intentions of the legislature in passing the law are clear. 
There was first an interest in ascertaining "^the impact of basic skills 
instruction. This was typified by the questions: What are the achievement 
levels of Washington students? How does Washington performance 
compare with the national? Do areas of weakness requiring special 
attention exist? A second purpose led to the display of district summaries 
of fourth grade test results, -The interest being two-foltf-to spotlight high 
achieving districts in order to learn from their success and to isolate 'low 
achieving districts for special assistance, A third purpose was to provide 
parents and the public with information about the impact of schooling, that 
is, to encourage educators to more fully share information related to 
program outcomes. 

The law is implemented through heavy reliance on contracted 
services. To accomplish major tasks such as the printing and scoring of 
tests, logistical services and analysis, requests for proposals are prepared 
and sent to interested bidders. The technical proposals submitted are • 
reviewed by outside panels of experts working independently. The 
recommendations of the technical reView panels are supplemented by the 
SPI staff analysis of bid amounts; the superintendent makes the final 
decisions, and contracts are written with successful bidders. In 
Washington, . contracts for $2,500 or more require that a competitive 
bidding process be used. Single source contracts for larger amounts must 
be justified and defended. In the case of contracting with other state 
agencies^, for example!, universities, educational service districts and LEAs, 
waiving the competitive bidding process is not difficult; however, when 
agencies other than those of the state are involved, great care is taken to 
explicitly foUow the rules. 

Since the total professional staff responsible for the testing activity is 
less than one full-time equivalent, contracted services are necessary and ^ 
play a crucial role. The typical pattern is one in which large contracts for / 
specialized services are awarded on the basis of technical merit and / 
competitive bid. The assistance of additional personnel is gained through / 
contracts with the other state agencies or school districts. Specific tasks / 
are completed occasionally through the use of single source personnel; 
service contracts under the $2,500 amount. Developing work plans and 
time schedules, preparing requests for proposals, reviewing bids, writing 
and mana^ng contracts are necessary skills for administering the 
Washington testing program, / 

The testing results are reported in several forms and through several 
channels. In the .case of grade four, individual student, classroom, school, 
and district level reports, including summary, data and item analyses are 
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delivered to the LEAs as soon as possible after the October testing. In 
December, the state's performance is publicly released to the LEAs and 
media. By the end of February the State General Report and District ' 
Level Summaries is disseminated to the legislature, LE^s and media. With 
the sample studies at grades eight and eljeven,. there is^less information to 
report. Since a sample is iised, no- classroom^, school or district reports 
exist. When possible, individual student results are returned to the schools, 
but the reporting consists primarily of a news release of the state's results 
followed by a general report sent to the legislature, district 
superintendents, principals of schools with the grades tested and the media. 

An additional thrust of the testing program aims at helping personnel 
in local districts to improve their skills in selecting, administering, 
interpreting and reporting test results. This effort usually takes the form 
of workshops conducted throughout the state. The first series, timed 
before the October testing of fourth grade students, focuses primarily on 
test administration. A second series, conducted after the state's fourth 
grade test results have been returned to the districts,, emphasizes 
interpretation, reporting, and use of test results for instructional 
improvement. 

Evaluation 



"MBrjnr~-ei^uation efforts revolve 'around the evaluation of selected, 
priority prograr^Tsv^^These are the programs in which the SPI has a special 
interest because tTrey— involve large, sums' ofU^oney and/or are 
compensatory or categorical in nature and/or are politicklj^y sensitive. 

For the past two years, evaluation priority has been placed on six 
programs: Title I, Title I-Migrant, federal programs for the handicapped, 
the Washington Urban, Rural, Racial Disadvantaged program, educational 
clinics for dropouts, and the Title IV, ^Part B learning resources program. 
Although these are all designated as important, the evaluation 
responsibilities vary from program to program, with greatest efforts in 
Title I and Mfgrant, and least in the areas of Title IV, Part B. 

In both Title I and Migrant, full annual evaluation reports are prepared 
for USOE. These reports are based on the computerized aggregation of 
data from applications, monitoring forms, interim reports, and year-end 
reports, as well as fiscal files and program office Jiles. The annual 
evaluation reports describe how the program resources were used, what 
outcomes resulted, what trends developed, and what special problem areas 
existed. The annual reports also show the extentjto^which the state plan 
goals and objectives were met. . , ' 

. In addition to the preparation of the evaluation report, the computer 
data files are summarized and printed to provide periodical management 
information for the program staffs. During the course of the year, the 
evaluators also assist in training LEA personnel to use program forms and 
procedures. 

There are two points of emphasis related to the evaluation of federally 
sponsored activities for handicapped students. For, several years, the main* 
responsibQity was for evaluating the special state projects provided by 
federal discretionary funds. This was accomplished throifgh year-end 
report data and on-site reviews. More recently, with the impact of P.L. 
94-^142 and the mandated individualized educational programs for 

\ 



ERIC 



6. J 



70 ALFRED F. RASP, JR. 



handicapped students, the evaluation staff has been working primarily to 
assist in the development of a computer processing system for management 
information, including an emphasis on organizing, monitoring and 
evaluating data. 

The Washington Urban, Rural, Racial, Disadvantaged (URRD) program 
was expanded in 1979 to include Remediation Assistance (RAP), Whereas 
the regular URRD program has provided money for a wide range of crisis 
oriented projects for the past decade, the RAP addition is strictly a 
compensatory program modeled sufficiently after Title I to qualify 
Washington for the Title I incentive grants when the federal funding 
becomes available; The evaluation section involvement with URRD takes 
many forms, ihcluding: the review of the evaluation plans specified in the 
grant applications, onsite project evaluations, the computer aggregation of 
compliance monitoring data, the follow-up study of students served, and a 
computer summary of application data, ' In the case of the new RAP 
component, assistance has been given in the development of program 
guidelines for LEAs and in the preparation of the reporting documents. 
The year-end evaluation activity will include the preparation of a 
statement on achievement gain in the style of Title L, 

Evaluation assistance to the manager of the Title IV, Part B learning 
resources program typically has taken three forms. The application and 
financial data are stored, aggregated and tabulated by computer for both 
interim management information and year-end reporting purposes. The 
results of LEA compliance monitoring by learning resources program staff 
are entered into the computer and aggregated. Three to five case studies 
involving onsite reviews of LEA activity were prepare<3 by the evaluation 
staff during each of the past three years. At the end of the year, computer 
printouts of updated program information and monitoring reports, along 
with draft copies of the case studies, are delivered to the program 
manager, who is responsible for preparing the annual report tor USOE, 

For the past two years, the State Board of Education has been required 
by legislation to certify education clinics organized to provide programs 
for school dropouts, and the SPI has been required to manage the funding 
process and to evaluate the programs. Because of the special legislative 
interest, the activities are politically sensitive beyond the small amount of 
money involved. The law itself calls for the evaluation of superior 
performance based on educational gain as related to the difficulty of 
educating the students and efficiency in terms of per pupil expenditures. 
The demands for evaluative precision outstrip the current state of the. art. 
An achievement and superior performance repcjrt prepared annually 
based on data aggregated from individual student record forms that are 
submitted by \^e clinics for each student entering and exiting the 
program. From -this information, a description of each clinic is prepared 
showing a difficulty to educate factor, an achievement factor, and an 
efficiency factor. 

Accountability and Other Responsibilities 

Since the Basic Education Act went into effect in September of 1978, 
school districts must \je judged in compliance, or have certain regulations 
waived, by the State Board of Education before the SPI can distribute the 
funds provided by the legislature to them. With 100 percept of the funds 
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for the basic program moving to districts through these channels, the 
deterrriination of district ciompliance is of crucial importance. For the past 
two years, this responsibility: has been fulfilled by the Testing, Evaluation 
and Accountability section. Forms are developed and distributed, and 
reports are reviewed. Recommendations based oh the district input are 
presented to the State Board of Education for action. As the board judges 
districts to be in ' compliance, the SPI's Division of Financial Services 
manages the apportionment^ of funds. Approximately one billion dollars 
flow to Washington's 300 school districts through this process each year. 

The school accreditation programs are also administered by the 
Testing, Evaluation and Accountability section. In Washington, two 
accrediting programs interface. The State Board of Education by law must 
provide an accreditation process to any school that applies. The program is 
voluntary, 'not a basis for finding, and available to all/schools. Although 
the State Board has accredited secondary schools for years, in 1979 the law 
was amended to add elementary schools. The accreditation program is 
currently in the developmental, field test stage. The ^'econd accreditation 
program is that of the Northwest Association of Schools and Colleges. The 
Northwest secondary school accreditation process has operated in 
Washington since 1917 and currently involves approximately 180 member 
schools., Management has bj'een a responsibility of ^he director since he 
joined the SPI staff in 1970| and a responsibility of testing and evaluation 
since the section was formed. Educational improvem/ent is the goal of both 
programs. The .central elenients of each revolve afound determining that 
the resources required for ^ quality educational prbgram are present and 
conducting an indepth selfpstudy with external Verification. The state 
board activity as part of thejSEA activities may mo've to another section in 
the near future. j , | 

Section personnel are responsible for a number of other 
activities— some of which are closely related to testing and evaluation and 
others related only by a grjeat stretch of the imagination. For example, 
liaison is provided with a number of organizations, including: American 
Educational Research Association, Washington 
Association, Northwest Evaluation Association, 
Educational Progress, Northwest Directors of Assessment, Committee on 
Evaluation and Information systems. Region X-Title I Technical Assistance 
Center, Northwest Association of Schools and' CouWes, National Study of 
School Evaluation, Association of Washington School Principals, 
Elementary School Principals Association of Washington, Washington Junior 
High/Middle School Principals Association, Washington Association of - 
Secondary School Principals^ High School-CoUegqs Relations Committee,, 
Washington Pre-CoUege Testing Program, and the Washington Alternative; 
Learning/Association. \ ' ' 

Qi^estions dealing with correspondence schools\ and high school 
graduation requirements are also answered by section personnel. In 
additioih, the section is the clearinghouse for many research activities in 
areas related to testing and evaluation. A current example is the "High 
School and Beyond" study toeing conducted by the Rational Opinion 
Research Center for the National Center for Educational Statistics. A 
project just concluded provides another example. From July 1977, to the 
fall of 1979, the (Sforthwest Reading Consortium, a four-stale Research a,hd 
Development Utilization Progrpim established by the Natiolaal Institute; of 
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Education, was\oordmated by the section. Section staff also provide 
technical assistance in planning, testing, evaluation and research as 
requested within the state agency and outside. 

STAFFING 

During fiscal year 1980, several staffing patterns are being used to 
provide the human resources necessary to complete the assigned 
responsibilities. At this time there are eleven people regularly on the 
section*s payroll. Seven of these are professional educators and four are 
secretaries. 

The seven professional staff members have all been teachers, but their 
backgrounds vary greatly. Five have earned doctorates, and they bring 
great diversity to the section because each studied a different specialty. 
For example, one received a doctoral degree in curriculum and instruction, 
another in educational psychology, a third in counseling and guidance, a 
fourth in reading, and the last in the administration of higher education. 
Of th« seven, two are former principals, one a school counselor, one a 
school psychologist, one a reading specialist, and one a former state 
education association president. 

As a result of experience and graduate study, all have backgrounds in 
educational research, but none have extensive formal training in 
evaluation. This is not to say that there is a lack of expertness. Since the 
section was launched in 1973, steps have been taken to develop the 
required skills. Through individual initiative, section staff development 
activities and on-the-job training, the staff has gained a high professional 
level of competence. In the areas of large scale assessment, program 
evaluation and the use of the computer to facilitate the aggregation of 
evaluation data, the professional strength of the section is noteworthy. It 
should also be noted that the section has earned a good reputation— a high 
level of credibility even though it frequently deals with tough topics that 
are not always viewed favorably by LEAs or by others in the SEA. 

The seven professional staff members represent two hiring patterns 
and three funding sources. Six of the seven are regular civil servants. The 
seventh is hired on the basis of special project need, and the employment 
must be renewed and approved at the beginning of each project year, 
depending on the availability of funding. The section budget is based on 
three sources of funds: state money provided by the legislature for the 
testing progam and for general SPI activities, such as administering the 
Basic Education Act and State Board of Education's accreditation program, 
federal dollars for state leadership in education and small amounts from 
the administrative money of Title I, federal handicapped, state 
compensatory and federal learning resources. ^ 

The regular secretarial staff consists of four people. There are two 
full-time secretaries, one part-time secretary working on program 
evaluation reports, and one part-time secretary assisting in the computer 
processing of data. As overload situations arise, temporary help is added 
as required with a minimum of bureaucratic strain. 

Eleven people cannot attain all of the objectives flowing from the, 
many assigned responsibilities, but with the size limitations imposed by the 
legislature, there are not staff years available for hiring additional 
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permanent personnel. Steps are taken to augment the staff through the use 
of personal service contracts. In some instances, the contracts call for 
another agency to provide personnel who will work under the direction of 
the section. In other cases, the accomplishment of specific tasks forms the 
prime objective of the contract. Occasionally when the need for assistance 
is short in duration or the specific task is small in scope, a personal service 
contract may be negotiated directly with an individual. 

Using contracts has both advantages and disadvantages. Certainly 
control over the size of the permanent staff is maintained, and there is an 
efficient flexibility for peak load staffing. However, the negotiating, 
writing, defending, and managing of contracts is time-consuming and 
frequently calls for efforts over and above the time normally spent on the 
superivision of personnel. There is also a potential problem in the lack of 
staff continuity and commitment to long-range goals. Because of the 
different types of contractual arrangements used, it is difficult to estimate 
the number of full-time equivalent staff members that serve the section 
during any given year. 



RELATIONSHIPS WITHIN THE AGENCY 

There are a number of relationships with the SEA that help to define 
the roles, the responsibilities, and, in a sense, the location of the 
evaluation unit. 

Success in fulfilling the evaluation responsibilities depends on close 
and positive working relationships with program managers. The evaluation 
staff and program staff negotiate a work plan specifying the activities, 
timelines and staff responsibilities that will guide the evaluation effort 
throughout the year. The cycle of involvement typically begins^ with a 
review of a program^s state plan. Placing emphasis in two areas, the 
objectives are analyzed to ensure that they reflect the major intended 
outcomes of the program, and the evaluation plan itself is elaborated and 
brought up-to-date. Application and reporting forms are examined to make 
sure that they will provide the information required and provide it in a 
condition compatible with computer data processing. With the assistance 
of the computer, information is aggregated and reported to managers on 
the predetermined schedule. The outline for the final report is discussed 
with the program staff, and draft copies reviewed before final printing. 
This review is conducted to provide program staff an opportunity to point 
out possible data errors and to provide a first-hand knowledge of the 
contents before the report is disseminated. 

In order to promote objectivity, the evaluation and program activities 
are clearly separated by housing each in different divisions of the SEA. 
None of the programs for which an annual evaluation report is prepared are 
located in the Division of Instructional. and Professional Services— the home 
of the evaluation unit. This separation solv^ two problems. Since the 
program staff is not evaluating itself, there is an appearance of greater 
objectivity. At the same time, since the evaluation is being conducted 
within the agency, there are evaluators who do know program strengths and 
weakness— needs and outcomes-^and who can provide credible data and 
testimony to legislative bodies and funding sources. This is an attribute or 
advantage that outside contractors typically do not have in Washington. 
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The relationship between evaluating and agency policy making is less 
clear than that at the program level. The assistant superintendent heading 
that division which operates the programs approves the evaluation work 
plan, and the director of testing and evaluation briefly discusses the 
planned activities with the deputy superintendent and the SPL These 
interactions, however, are frequently routine, resulting in statements like, 
"Sounds good, let's do it." The attitude is not negative or disinterested, 
rather it reflects confidence in the negotiated evaluation plan and the 
evaluation procedures being used. Simply stated, the position of the SPI 
policymakers seems to be: the job is getting done, there have been no 
great problems, funding sources are happy enough--why make changes? 
Why disrupt the process? 

There are additional relationships. A computer playing a central role 
in the processing of evaluation data generates another set of interactions; 
and good working relationships between "man" and machine are crucial to 
the smooth implementation of the evaluation process. In 1976 a 
mini-computer was acquired to help solve the problems brought about by an 
abundance of work and a shortage of staff. Although the machinery and 
programming have becomig more sophisticated and the amount of data 
processed has grown, the basic human tasks have remained the samfe. 
Efforts go into the streamlining of application and reporting forms to make 
them more efficient for the entry of data into the computer and the 
aggregation of essential decision making information. The items on the 
forms are coded, if necessary, and entered into the machine. The reports 
of LEAs are printed out and returned to them for correction. The data are 
aggregated to match with the requests of progoram management, and the 
final updated computer files are used as the basis for preparing the annual 
evaluation report. Learning to work with the computer has been difficult 
for some program managers, but the system is expanding and providing a 
broader range of evaluation services each year. 



Definition and Purpose 

In a recent roundtable discussion, members of the evaluation staff 
were asked to define "evaluation," and in every instance the responses 
represented a variation on two themes. First, evaluation was described as 
an objective process of collecting and organizing data, of judging impact 
and ascertaining value. Second, everyone agreed that the definition was 
not complete without a statement of purpose; for example, recognizing 
that evaluation is conducted to assist managers in making decisions. 

Although this definition is broad enough to encompass the generally 



successful evaluation. These revolve^ first around the clear delineation of 
what data are required; that is, what questions are to be answered and, 
second, around the effective displaying and reporting of evaluative 
information. 

Following this definition, the state evaluation effort includes: 
deciding what data are needed to answer the key evaluation questions. 
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coUecting and organizing the data, providing the necessary comparisons 
and judgments, and reporting the information in a way useful for decision 

"^^'^ We'ally, the purpose of evaluation is to provide sufficient information 
about program alternatives so that managers can easily see the 
comparative value and make decisions that promote ^effectiveness and • 
^efficiency. Ttie ideal situation generates information which reinforces the 
need for the program treatment and shows its impact. The evaluation 
activities ideaUy would foUow a linear sequence in which needs were 
determined, objectives set, programs implemented, outcomes measured and 
information required to guide the next program cycle. In real life, 
however, the process is often abridged and seldom on the time schedule 
implied by the planning model. In most cases, the data generated are more 
important as formative information for program managers than as 
summary data for high-level policymakers, and certainly the data are more 
descriptive than judgmental. . .. » * 

Data collection is the easiest phase of evaluation, but it must be 
clearly established what decisions are going to be made, what data are 
needed, and when the analysis and report must be ready. In general, data 
are coUected about the program targets, about the actual "performance or 
outcomes and about the resources used. More specifically, several 
questions must be answered to juide data coUection in the Washington 
evaluation process; What needs are addressed by the program? What 
objectives are included in the state plan? What implementation strategies 
are suggested? What outcomes are expected? What resources ^e 
provided' What groups are involved? Who is served? What are ^ the 
treatments? How are the resources used? What are _ the program 
outcomes? What comparisons are appropriate? 

Two additional questions influence the effort. Who is interested in the 
outcomes of the evaluation? Who ought to be interested? The answers are 
important for reporting purposes, but they also help to solve delineation 
problems in the future. 

Five Uses of Data 

1 Annual reports are prepared to meet the requirements of state and 
feder^ funding sources. These reports generally attempt to provide the 
answers to the previously mentioned questions guiding the data collection. 

2 The data are also used to assist program managers in becoming 
more effective'and efficient.' Management memos are prepared, as part of 
the annual reporting process, but the audience is the state's program 
managers and policymakers, not the funding sources. The goal is to help 
the state to better meet its obligations through improved practices and 
quality control. The content of a memo may vary from the comments 
related to the need for improved office practices (for example better 
written documentation of program changes), to the highlighting of 
objectives not met, or questionable fiscal practices. Although. the program 
managers are 'not always pleased with the content, this use of evaluation 
data is viewed as a constructive practice. 

3 The deputy and the superintendent rely on the evaluation section to 
keep them informed of any special circumstance that could ultimately 
require their attention. A third major use of the evaluation data is to 



ERIC 



76 ALFRED F. RASP, JR, 



make sure that policy makers are not suddenly confronted with an 
unpleasant surprise. They want to know in advance, for example, about the 
unpredicted concerns of special interest groups, anticipated major, 
management problems, and possible audit exceptions, , 

4, An abundance of descriptive data is available in the computer files 
to provide information for decision making— a fourth major^ use. The files 
include, for example, numbers served, money spent, time in programs, 
delivery modes, staffing patterns, parent advisory committee activities, 
and program outcomes, The-da^a-are-guranged-by districts, by programs, 
and by funding source. The information is especially useful to program 
managers because it provides an up-to-date reflection on how resources are 
being used, that fe, who is being served, in what ways, and at what costs. 
Through the aggregation of project monitoring input, the managers can also 
review which projects are out of compliance afid which rules or regulations 
are causing problems in the field, 

5, Policy makers working at a different level of decision making 
abstraction use the data in two special ways. First, by reviewing the 
information available^ they can keep the impaqt measures promised to 
funding sources as part of the accountability process more reasonable aind 
within the range of possibility. This sensitivity is critical if credibility is 
to be established and maintained. Second, the data are used to support 
decisions previously made. Since often the timing of policy making and the 
collection and analysis of evaluation data cannot happen in the preferred 
sequence, the required data are estimated on the basis of past experience 
and updated when the actual data become avaUable, hopefully to confirm 
the decision, « ' 

Evaluation and Policy Making 

The term "policy maker" refers to the actions of the top-level^state 
managers as they develop' the budget, the legislative thrust, and provide 
direction for the agency and the overall operation of the Washington 
common school system. This management level consists of the 
superintendent and his administrative staff, the deputy, and the five 
assistant superintendents who head the five agency divisions. The 
superintendent is an elected official, and the members of this policy group 
serve at his pleasure and'are exempt from the state civil service rules. 

As specific policy questions arise, section directors, who are tenured 
state employees and provide professional continuity, are frequently invited 
to join ir^ the policy-making proceedings. For example, the director of 
testing and evaluation provides significant input into policy decisions 
regarding the state level activities in that area. Describing this 
interaction as "providing input," however, does not reflect the full range of 
the dynamics. Although the process is not formalized^ in State operating 
procedures, there is an active two-way exchange. The director does 
participate in policy making related to testing and evaluation, but perhaps, 
more importantly, policy makers rely on the section director to keep 
abreast of educational, legislative, and executive activities, both state and 
federal, and to take the initiative in providing them necessary 
information. The director, in a sense, is asked to be an advocate of sound 
professional practice and to also be able to discuss the impact that 
alternative decisions would have on various components of the educational 



WASHINGTON 77 



community, with interests in testing and evaluation. In addition, it should 
be noted that just. as open discussion and full input are expected before a 
decision is m'ade, once it is made, everyone is expected to fully support its 
implementation. 

The development of the state budget provides two examples of 
evaluation's involvement in policy making. Along with the amount of funds 
requested, each item in the budget mi^st be defended with statements 
-demon&tfa^g-the-existerice-af-rreed'fi^^^^ describing the measures that will 
be used to show impact. The state testing program is a specific budget 
entry, so there is a direct policy making interaction regarding the 
activities planned for the biennium covered by the budget, the amount of 
money that will be required for implementation, and how the need and 
impact will be described in relation to the mandates of the state testing 
law. Since there is a tendency in organizations supported by budget 
allocations for sections or programs to attempt to show their value in 
terms of the amount of resources they command, the budgeting process 
frequently leads to a compromise. The demand for resources is greater 
than the supply, and a compromise between what is desirable and what is 
necessary by law and required for state leadership results. The series of 
negotiations is active and positive, and reflects both the superintendent's 
and legislature's priorities. , ^ • ^.u 

The testing and evaluation section is additionally involved in the 
budget making process through a technical assistance role. To maintain a 
solid reputation with the Governor's Office of Fiscal Management and the 
legislature, the needs, impact measures, and outcome data specified in the 
budget .must .be deliverable. Program managers and even the executive 
staff are tempted occasionally to promise data that cannot be obtained. 
They are especially tempted to promise the measurement and reporting of 
achievement gain as an indicator of impact because this is the "hard'^ data 
that funding sources prefer. The problem is that in most instances it is not 
possible to deliver. Over the short run, the use of high sounding impact 
measures may bring funding; in the long-run, the loss ^of credibility 
outweighs the temporary advantage and * causes major problems. The 
section is called on to help guard against this happening. 

The development of the Washington Remediation Assistance Program 
provides another example of participation in policy making. In an effort to 
gain more resources for the support of the schools, the superintendent, 
with the advice of his administrative staff, decided to seek legislation and 
funding to promote the remediation of basic skills deficiencies in the 
intermediate grades. It was decided that results of the fourth grade 
testing program as the most 'believable" data available would serve as the 
entitlement system, for the allocation of the program»s funds. The 'thrfee 
years of background data were us.ed to establish the ne^^d and the 
distribution mechanism, and the experience with Title revaluation and the 
USOE models became a key element in the evaluation design. This latter 
connection was important, since the state remediation prpgram was being 
organized to qualify for the federal incentive grants under Title I. 

As mentioned in the discussion ^of the URRD program, the section also 
assisted in the development of RAP administrative guidelines and in the 
preparation of reporting forms. The achievement or impact section of the 
annual report will be prepared by testing and evaluation personnel. 
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AREAS OF FRUSTRATION 

Like all evaluation units, the Washington unit has faced many 
frustrations. Some situations have been solved, others circumvented, ^but a 
number, however, remain consistently unresolved and irritating. The 
following 12 situations provide brief examples of problems, both 
philosophical and practical, which continue to frustrate members of the 
Washington evaluation staff. 

1. A clear role differentiation between evaluation and research has 
not been established. As a result, there is a range of instructional 
activities which have never been verified as efficient or effective, and 
there are probably a number of teachers who work very hard to do things 
which may not promote learning. Typically, the evaluator's prime role is to 
collect and organize data which describe program inputs and. outcomes. In 
most instances, the researcher is interested in developing generalizations 
which explain or predict events. Neither researchers nor evaluators 
generally develop instructional materials or procedures, nor do they spend 
time checking to see if instructional methods are faulty, or misused. There 
is a depressingly large twilight zone resulting from the unchecked 
assumption that the instructional methods used, in a' project accurately 
reflect the research findings and are being used appropriately. In fact, 
there is almost no research or evaluation energy applied to the analysis of 
alternative interventiori strategies in Washington. 

2. The role of "describing" in evaluation also requires clarification. 
The growing use of case study and ethnographical approaches suggest 
sieveral "Questions: Where does the description stop and evaluation begin? 
How can the comparative statements so frequently demanded of evaluation 
be made? The potential of using these data collection techniques' to assist 
in the program evaluation has not yet been fully realized, and the 
frustration of trying to harness the rich data into an evaluation statement 
with utility for an audience not on site persists. 

3. Another frustration on the order of a "pet peeve" is the inability or 
unwillingness . of program managers to separate monitoring from 
evaluation. Evaluation is clearly distinguished by an emphasis on program 
impact or outcomes. The on-site reviewing of projects to ascertain 
compliance with rules, regulations, stated objectives, and negotiated 
budgets is an important management function, but it is not evaluation. To 
consider that the worth of a project can be determined by the degree to 
which it is in compliance is misleading. The responsibility for monitoring 
as a management function is moving more and more to the program staff, 
and the energy and resources of the evaluation section are focusing on the 
'evaluation questions of impact and efficiency. The movement is not 
complete; and to the extent that it is not, the frustration remains. 

4. A fourth disappointment stems from the fact that too many 
educators are willing to use evaluation as an end in itself and to limit 
program emphases and alternatives to those amenable to "good" evaluation 

'designs. Frequently, evaluators are blamed for causing this practice when 
they probably speak most loudly against allowing the evaluation to 
determine the program pcurajneters. The situation has developed, or 
degenerated, in some instances, to the point where greater pride is taken in 
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evaluation results than in the actual program outcomes, and this "Catch 
22" scene appears to be growing, ^ 

5, A great frustration also results from aggregating impact and 
related data from 300 Washington school districts and watching important 
distinctions "wash out" in the averaging. There are successful projects arfd 
significant differences. However, implementing laws, for example, which 
call for , the correlation of "appropriate input variables", with the 
achievement of grade four students tends to often obscure the situation 
rather than clarify it, . . . 

6, Using evaluation data inappropriately to respond to outside inquiry, 
or generalizing beyond the power of the data, is a persistent frustration. 
An example, once again from the fourth grade testing, illustrates the 
problem. Frequently, letters come to the agency from people moving to 
Washington asking for help in deciding where to locate. The usual response 
is to send a copy of the fourth grade assessment report with 
district-by-district achievement results. This report shows fourth grade 
achievement scores, district level per pupil expenditures (a ,08 correlation 
with achievement in 1976) and an average family income figure based on 
1974 estimates (a ,47 correlation with achievement in 1976), The sending 
of the report by implication suggests that it contains reasonable data for 
deciding where to live. One might say that this information is^tbetter than 
nothing, but the frustration is— it is not sufficient information for judging 
the quality of a school district, 

7, Computer processing of evaluation data causes a frustration of 
major proportion, or perhaps more"" clearly, four frustrations. First, the 
battle is still being waged against the mentality that views the computer as 
something magic, with the ability to aggregate ernors into precision. 
Second, the time-consuming problems of moving data from report forms 
into the machine are not fully appreciated by program managers, and the 
problems have not been solved completely, Tnird, working with a single, 
"one-owner" machine as compared to a service center, gives control, but a 
minor machine breakdown causes a major disruption of service. Fourth, 
the greatest* frustration results when, even after lengthy planning and 
negotiations, program managers demand answers to questions that are not 
compatible with the data collected, stored, and programmed, 

8, problem of gaining sufficient support service in the preparation 
of evaluation reports has not been solved. The desired editing, graphics^ 
lay out, and printing skills are not readily available in the agency, and 
going outside for assistance is difficult because of the rules regarding the 
role of the state printer, 

9, The lack of congruence between the evaluator's "logic" and 
political decision making "logic" is also a keen source of frustration. The 
old and accepted political process of basing decisions on power 
relationships is effective in lining up support-and getting certain jobs done. 
The fact that power politics works, however, does not detract from the 
frustration of presenting objective and overwhelmingly persuasive 
testimony based on accurate and logical evaluation data to a legislative 

^committee and experiencing a contrary decision, 

,10, Another large source of frustration arises from the inability to 
gain a clear delineation from the policy makers regarding the decisions 
they will have to make, the information they will need, and when it wai be 
needed, A major reason for the difficulty Is that frequently evaluation 
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data play only a marginal role in policy . making. In the name of 
accountability, programs must be evaluated and the evaluation report must 
indicate what needs are being addressed and what outcomes are being 
obtained. Too often this information is treated as an enjj product, the 
report is made and filed, but the information is not used- for planning 
purposes. Evaluators by default often carry on the delineation activities 
vicafiously and hope for an accurate match with policy making needs. It is 
a difficult problem to solve, but repeatedly, clarification is lacking and' the 
evaluation data are not on target land not useful for policy making. . 

11. An ultimate set of f^rustrations revolves around time . One 
problem is the nearly complet_e acceptance of the logical planning sequence 
which tends to mislead people into thinking that if they go through the 
steps they will automatically accomplish something. Evaluators move, back 
and forth through the sequence in many different orders, and in most 
instances probably start by trying to establish what people with a stake in 
the program would be willing to accept as evidence that it is working. 

A second time consideration that is frustrating could be labeled a^ 
continuity problem, and one specific example will elaborate the point. 
During a recent school year, representatives of the evaluation section 
joined with the university and educational service district staff to work 
closely with 12 small school districts on the Olyrripia Peninsula to help in 
planning and conducting assessments and evaluations aimed at clarifying 
- priorities, and isolating problem areas in the curriculum. The effort was 
productive, and by the end of the school year all involved had gained 
professional satisfaction for a job well done. The teams were eager to 
start the, next yearns round of activities. In September, however, the 
excitement faded— half of the districts had new superintendents and one 
district no longer existed. 

A third time • element is the recognition that time, or more 
specifically, timeliness itself, is a critical variable in evaluation. Perhaps 
the supreme frustration is to conduct a sound evaluation, generate useful 
information, ^nd -deliver a well documented report— j^st after the crucial 
decision has been made. In the spring of 1974,-the.legislature mandated 
that the department conduct an evaluation consisting of pilot studies in 
LEA accountability and^ a statewide assessment of basic skills 
achievement. The law passed in April 1974, the funds became avaUable 
July 1, and plans were developed in detail. Accountability projects were 
initiated involving a university, an educational service district, and several 
LEAs in the right geographic mix. An- achievement test was developed 
using items from the National Assessment of Educational Progress, a 
scientific sample of students was drawn,. and the testing scheduled for 
April 1975. All efforts were aimed at making best use of time and dollars 
and having the study*s report completed by June 1975. Of course, ^the~ 
legislature met in January 1975, asked for the information, wondered what 
was taking so long, and nearly p'assed bad legislation. 

12. A final frustration results from' the fact that SPkstaff and staff 
assignments are constantly changing. For example, over the past six years 
the Title I program staff has changed in some way every year, the 
management pattern for the migrant program has chaj^ged each of the past 
three years, and the assignments and personnel the program for 
handicapped children haVe also changed dramatically. Federal legislation 
andVepdrting requirements in all of these areas have also undergone major 
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transformations. In addition, reorganization within the agency has become 
a\wa^ of life, and it seems as if the evaluation process always involves new 
people in one ph^ or another. Petronius Arbiter caplured the idea when 
heVreported his frustration in 60 AD: 

\We trained hard, but it seemed that every time we were 
\beginning to form up into teams we would be reorganized. I was 
to learn later in life that we tend to meet any new situation by 
reorganizing; and a wonderful method it can be for creating the 
illusion of^progreffi-while producing confusion, inefficiency, and 
dAmdralizatioh. 

Of course, Washington evaluators do not have a corner on this frustration. 

\ IN CONCLUSION 

During the 1970s, the emph&sis . on program evaluation gained 
widespread public popularity, as well as the stiong support of executive 
policy maimers and law making bodies. In fact, few developments have 
made so thorough an intrusion into the operating practices of education. 
There has \ always been an emphasis on the precise measuring and 
accounting ^or resources, such as the number of books in the library, 
pupil-teacher ratios, and' the number of hot lunches served; but the stress 
placed on thcl^ evaluation of program results has come as an intricate part 
of the accountability movement. Other professions have concerned 
themselves with various forms of input-output analysis for many years; the 
decade of the 70s, however, marked the general introduction of 
accountability and program evaluation into the educational setting. The 
concern for the analysis of resources used and results gained is real and 
growing. Howeyer, the fact that there is a lot of "program evaluating" 
going on should \\oi be interpreted as an indication of educational progress 
or be confused wjth claims of program improvement. 

Evaluation has come under criticism in recent times, especially from 
evaluators themselves. There has been a tendency, perhaps, to address too 
many of the tough problems related to the conduct of evaluation 
intellectually rather than practically— some of the most reputable 
evaluators are spending more time verbalizing about evaluation than 
practicing evaluatipn. One of the results is that new models, approaches, 
and strategies are developed and discussed, but the basic trouble spots 
plaguing "af)plied evaluation" remain unresolved. Crucial among these is 
the overestimation of the influence of evaluative information on 
managemient decisiops and policy making. The intent of this paper was to 
provide, background examples tc use in thinking about evaluation issues and 
Wft^s of improving practices. 

7 As stated earlier, few ideas have spread more rapidly to permeate the 
fiild of education than the concepts of accountability and program 
evaluation. Making the movement pay off with improved practices and 
better education for those served is still the challenge. \ 
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CHAPTER 4 

The South Carolina Experience 

Paul D. Sandifer 



The purpose of this chapter is to describe the interaction between 
policy and evaluation in the South Carolina Department of ^ Ecjucation 
jsCDE), More specifically, how does policy influence evaluation^and how, 
*if at all, does evaluation affect policy within the agency? " ; " 

A literal interpretation of the purposte leads one to atte^mpt to view 
the SCDE in isolation from other state -and federal agencies, as well as 
special interest groups.^ 'Although such an approach would have the 
advantage of TesiSting in a much briefer 'chapter, it woiild ignore the 
considerable influence of other agencies and. groups in shaping the policies 
of the SCDE and in establishing external policies under which the agency 
must operate, 'Consequently, the focus of the chapter is not limited to the 
policy/evaluation interaction within the SCDE but also examines some of 
the ways in which various agencies and special interest groups affect 'policy 
and evaluation at the state and, consequently, at the local school distf i^t 
levels. 

Although" the chapter -^focuses primelrily on the SCDE and the 
interactions between it and other agencies, my perceptions of those 
interactions are no doubt colored not only by my particular responsibilities 
in the agency, but also as a result of seven years as an administrator in the 
Wyoming Department of Education and nine year^jas a ^teacher in public 
schools in Mississippi and Wyoming. An external observer, or other staff'of 
the SCDE, might have perceptions of the policy/evaluation interaction that 
are quite different from my own. Since such differing perceptions are the 
rule rather than the exception, I recognize that the thoughts expressed 
herein are but one version of the "truth." 
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Definitions I 

Although the terms "policy" and "evaluation" are widely used, and 
perhiaps just as widely understood, it seems advisable to define the terms as 
they are used in the remainder of the chapter. Policy, as used herein, is 
defined as including all legislation, regulations, position statements and 
policy statements, e.g., "the expressed ' policy of the State Board of 
Education is. . .," the intended purpose o( which is to determine a course 
of action, establish a- program, or proj^ide a framework within which 
decisions are to be made. 

Evaluation, as used herein, is defined as the utilization of information,' 
obtained through a systematic process of data collection, for any of ^he 
following purposes: assessing the impact of established . policies or 
programs; comiparing the effectiveness of two or more programs; assessing 
the degree of compliance with established policy; or influencing the 
establishment of new or revised policies or programs. This definition of 
evaluation deliberately avoids any attempt to drafw the traditional 
academic distinctions between policy studies, research, evaluation, and 
assessment. This is done for two reasons. First, the common distinctions 
among these terms focus more on questions asked and procedures followed 
than on the use(s) made of the information collected, i.e., the distinctions 
are more semantic and academic than they are real, and second, the results 
of research, evaluation (in the traditional sense), assessment and policy 
studies are all used, to varying degrees, in efforts to formulate or modify 
policies' and programs. Whether a particular data collection effort should 
legitimately be labeled an ewluatiort^ saems to be more appropriately 
determined by the use(s) mM^ of the data than by the particular study 
design or the procedures used' in collecting the data. Regardless of the 
complexity and/or degree of sophistication of ian "evaluation" design, the 
act of collecting data does not constitute evaluation. Evaluation occurs 
only after the data are collected and then, only if the data are used as a 
basis for making j'idgments about worth, value, or effectiveness. Although 
they may not be acted upon by those in policy setting positions, the first 
place such judgments are normally identifiable is in the evaluation report. 

Although they are consistent with, the definition given here, many of 
the examples of evaluation used in this chapter will not be regarded by 
academicians as "true" evaluation. However, the broad definition of 
evaluation previously given Is necessary in order to understand the 
policy/evaluation interaction. 

o 

Organisation of the Paper 

The remainder of the chapter is comprised of four major sections. The 
first, "South Carolina Public Education: State Organization and 
Administration," provides a description of the context within which my 
perceptions of the policy/evaluation interaction have been formulated. 
The second, "Policy: Influence on Evaluation," concerns the ways in which 
policy determines what is to be evaluated and the impact that policy has on 
evaluation methodology. The third, "Evaluation: Influence on Policy," 
represents my perception of the conditions under which evaluation does, 
and does' not, influence policy. The final section, "Other Factors 



SOUTH GAROLIN^A 8^ 

\ 
\ 

Influencing Policy and Evaluation," examines the impact which special\ 
interest groups have on the formulation of policy, the design of evaluation, 
and the uses to which evaluation findings are put. 

sbUTH CAROLINA PUBUC EDUCATION: 
STATE ORGANIZATION AND ADMINISTRATION 

State Superintendent of Education 

The Office of State Superintendent of Education was established by 
the Constitution of 1868. The Superintendent is elected on a partisan 
ballot for a four-year term and there is no limit on the number of 
consecutive term^ which the Superintendent may serve. During the period 
from reconstruction until 1979, twelve superintendents Were elected to 
office. The current Superintendent, Dr. Charlie G. Williams, began his 
first term in January, 1979. < 

The general duties of the State Superintendent, as prescribed in the 
School Laws of South Carolina (1976),! include: 

i 

1. serving as secretary and administrative officer to the State 
; Board of Education 

2. supervising and managing all public school funds, provided by 
the State and Federal Governments 

3. organizing, staffing, and administering a State Department 
• of Education \ 

4. . administering, through the State Department of Education, 

all policies and procedures adopted by the State Board of 
Education 

State BjLard of Education \ ' 

The State Board of Education is comprised of seventeen ^members, one 
from each of the sixteen judicial circuits and one member at large. The 
members from judicial circuits are elected by the legislative Relegations 
representing the counties of each circuit. The "at large" member is 
appointed by the Governor. The terms of the members are four years and 
no memlier may serve ^onsecutive terms except by the uiianimous consent 

i of all' members of the county legislative delegations from his/her^judicial 
circuit. The statute pertaining to the composition of the State Board 
contains no provisions excluding professional educators from service on the 
Board. The' present chairman and five other members are professional 

' educators. Although the members are elected by their legislative 
, delegations, the practice of electing educators ihas received criticism from 
some members of the General Assembly. During the past several years, 
legislation has been introduced, but not enacted, to restrict State Board 
membership to the lay public*. 

The general powers of the Board include: 
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1. adopting policies, rules, and regulations not inconsistent 
with the laws of the State for its own government - and for 
the government of the public schools 

2. annually approving budget requests for the institutions, 
agencies, and services under the control of the Board 

3. adopting minimum standards, for any phase of education, as 
are considered necessary to aid in providing adequate 
educational opportunities and facilities 

4. prescribing and enforcing rules for the examination and 
certification of teachers 

5. prescribing and enforcing courses of study for the public 
schools 

State Department of Education 

The administrative structure of the State Department of Education 
includes three divisions which are under the supervision of Deputy 
Superintendents who are^directly responsible to the State Superintendent of 
Education. Each of the three divisions. Administration and Planning, 
Instruction, and Finance and Operations, includes several offices which, 
collectively, administer the programs for which the agency is responsible. 
Although most of the offices include two or^ more sections, the 
organizational chart (Figure 1) does not include detail below the office 
level. The organizational pattern of the Department has remained 
relatively stable during the five years in which I have been an employee. 
The only significant' changes, the creation of the positions for asisociate 
superintendents and special assistant for legislative affairs, as well as 
placing the Office of Personnel under the direct supervision of the State 
Superintendent, have occurred since 'January, 1979.2 

The Department employs 1,079 individuals of whom approximately 
one-half are involved in the maintenance and operation of the state 
supported pupil transportation system. With the exception of the 
empltjyees of the Office of Transportation, most of the staff are based in 
the agency offices in Columbia. 

Evaluation functions within the agency are decentralized. Although no 
office title within the agency includes the word evaluation, several offices 
carry out activities which fall within the broad definition of evaluation 
which was presented earlier. There is, however, no State Board or 
,pgency-wide policy concerning evali/ation responsibilities. 

Each office which funds programs operated by school districts has, or 
assumes, the responsibility for evaluating, or monitoring the evaluation of, 
all programs which it administers. The determination of which programs 
are actually evaluated is, more often than not, a function of federal 
mandates. The offices most heavily impacted by federal mandates for 
evaluation are Federal Programs, Adult Education, Vocational Education, 
and Programs for the Handicapped. 
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Two offices which do not administer funds for locally operated 
programs but which are involved in evaluation activities are Technical 
Assistance and Surveys, and Research. The Office of Technical Assistance 
and Surveys conducts, at the request of local school boards, studies to 
determine needs in the areas of administration, curriculum, personnel, and 
facilities. The results of these studies frequently provide the basis for 
district planning to meet the identified needs. Although in the traditional 
sense these studies might not be considered evaluations, they do provide 
information on which policy is based and actions are taken. 

The office of, Research is involved in evaluation activities in three 
ways. The first, and most time-consuming, is through the administration of 
the Statewide Testing and Basic Skills Assessment Programs. The second 
involves data collection to assess the financial impact of proposed or 
existing policies, e.g., "What is the projected cost for facilities required to 
implement a legislative mandate to reduce pupil/teacher ratios in grades 
one through three?" Finally, evaluations arc occasionally conducted at the 
request of other offices within the agency or'as a result of a decision of 
the Slate Superintendent. , - . 

c 

POUCY: INFLUENCE ON EVALUATION 
Organizational Influence 

Whether the decentralization of evaluation responsibilities in the 
SCDE is more a function of default than conscious decision making is not 
known. However, evaluation as a recognized responsibUity of state 
education agencies is sfiU in its childhood, or at best early adolescence, 
and some agencies - have not chosen to organize in a m^anner that 
concentrates that responsibility in one unit. Experience gained by serving 
in' two state eduL»ation agencies, one in which the evaluation function is 
centralized and one in which it is decentralized, indicates that both 
organizational patterns have their unique disadvantages. ^ ^ 

The influence of agency structure on evaluation is^ evidenced in several 
ways. When the responsibilities for evaluation are decentralized there may 
be no common criteria which are uniformly applied either in the 
employment of staff or in the design of evaluations. This frequently 
results in: 

1. considerable variation in the level of expertise of evaluation 
staff assigned to various offices within the agency 

2. a greater than acceptable degree of variation in the quality 

of evaluations ^ 

/ . t> ' 

3. a lack of consistency in the kinds of evaluation requirements 
which the various offices impose on lo<jal school districts 

When evaluation* responsibilities are decentralized, evaluators are 
frequently directly responsible to the admiriietrators of the programs for 
which they have evaluative responsibility. Even if objectivity can be 
maintained in such situations, evaluation findings may lack credibility 
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because of an apparent conflict of interest.^ This situation, however, is not 
automatically overcome by centralizing the evaluation function within the 
agency. Individuals or groups external to the agency may still consider 
findings with which they disagree to be merely a reflection of the agency^s 
bias. ^ 

With centralization of the evaluation function, the disadvantages cited 
above are eliminated or at least alleviated. On the other hand, if- 
..^valuators are assigned to a unit that has no programmatic responsiblity, 
they may well be viewed with suspicion and distrust by the administrators 
of* those programs which' are to be evaluated. Additionally, 
communications across or through administrative^ctiannels in a bureaucracy 
can be both slow and frustrating. 

Whether responsibilities for evaluation ^ are centralized or 
decentralized is . probably less important— than- having an agency 
commitment to "good" evaluation for the purpose of addressing policy 
relevant issues. In the absence of such a commitment, it is unlikely that 
agencies will anticipate information needs and consequently will frequently 
be pla(5ed in a reactive rather than proactive role in the policy making 
process. 

Federal Influence 

Policy, regardless of the governmental level at which it is created, 
apparently influences evaluation in two major ways: first by determining 
what is to be evaluated, and second, by shaping or dictating the evaluation 
methodology! 

Determining what is to be evaluated is, in many instances, not a 
matter of choice for any state department of education since state and 
federal statutes and regulations, e.g., ESEA Title I, may be very explicit in 
ttiis regard. A major portion of the evaluation efforts of the SCDE are 
directed toward complying with federal mandates, although^ federal funds 
account for only approximately 14 percent of the annual expenditures for 
public elementary and secondary education in South Carolina. This is not 
to say that tpo many resources are expended in evaluating the 
effectiveness of federally funded programs. It does suggest, however, that 
in the past we have probably spent far too little time and money in 
evaluating state funded programs. 

The impact that policy has on shaping or determining • evaluation 
methodology is » nowhere more easily identifiable than in the evaluation 
requirements for ESEA Title I. The federal regulations stipulate the only 
evaluation methodologies that may be used by state and local school 
districts. Any exceptions to the prescribed metht>dologies must be 
approved by the U. S. Commission of Education. The expressed rationale 
for these models is that they will yield comparable data on pupil 
achievement that can be' aggregated to the national level, i.e., across 
school districtSoand states. Various critics have raised questions about the 
validity of this assertion because of some unresolved technical issues 
surrounding the models. Assuming, however, that, the models can and do 
.yield reliable and valid data, another basic question is still unanswered. 
How can aggregated pupil achievement data be used in addressing a policy 
issue which, on the face of it, is more social and political than educational 
in nature? W^ld it not be sufficient, and perhaps more appropriate, to 



90 PAUL I>. SANDIFER 



determine: U) whether the target population as defined in legislation is 
actually being served; and (2) what type of instructional programs or what 
organizational patterns are most effective in meeting the needs of the 
educationally disadvantaged? ^ 

As a member of the Evaluation Sub-Committee of the Committee on 
Evaluation and Information Systems (CEIS) of the Council of Chief State 
School Officers, I have been privileged to hear much of the debate 
concerning the Title I models. Apart from the questions which have been 
raised about the technical quality of the models and the policy relevance of 
the data, concerns have also been voiced about the "test only" approach to 
evaluation* Local district personnel are concerned that the use of Title I 
funds for evaluation will be restricted to the collection of data required by 
federal regulations and consequently, their efforts to examine the 
effectiveness of other program components will be severely hampered. 

In addition to the Title I evaluation m^KJels which are already being 
used, the USOE is developing models for the evaluation of programs for 
migrants and for children in institutions for the neglected and delinquent 
(N and D). In the first draft stage, the models for programs for N and D 
are also pupil achievement Wsed. The methodology to be used is certainly 
being determined by policy which will prbbably be promulgated in the form 
of regulations. 

Beyond the impact that specific program policy, e.g., Title I, may have 
on evaluation methodology as appUed to that program, the effects often 
carry over into other programs. For example, the comparability and 
non-supplanting requirements of Title I, coupled with the Office of CiviP 
Rights regulations which prohibit grouping that results in the formation of 
racially identifiable classes, virtually prohibits the use of experimental or 
quasi-experimental design in evaluating programs that may have little, if 
any, relationship to the federal programs which have placed constraints on 
evaluation in general. 

State Influence * 

^ Three recently enacted South Carolina statutes pertaining to 
education include explicit evaluation requirements. These are the "South 
Carolina Education Finance Act of 1977,^' the '^asic Skills Assessment Act 
of 1973," and the "Teacher Training, Evaluation and Certification Act of 
197'9,"3 

The Education Finance Act includes an "accountability" section which 
requires: 

1 . the establishment of school advisory councils 

2. school and district based needs assessments 

3. the development of annual plans to meet identified needs * 

4. district participation in the statewide testing program as 
prescribed by the State Board of Education 

5. Annual reporting of program effectiveness to the general 
public and the State Board 
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The Basic Skills 'Assessment Act, although not as explicit in its 
requirements as are the federal regulations concerning Title I, contains 
requirements which effectively shape methodology and instrumentation. 
The law requires: 

.* 

1. the administration of a readiness test at the beginning of 
grade one 

2. tests of reading and mathematics at the end of grades one, 
two, and three 

'3. tests of reading, mathematics, and writing at the end of 
grades six and eight 

,4. a test of ''Adult Functional Competency'' at the end of grade 
eleven 

The law includes a time-table for the development and implementation of 
the program and stipulates that the tests shall be criterion-referenced. 
Further, the tests shall be used for the purpose of diagnosing student 
deficiencies and as a basis for remediation. The tests are not to be used as 
a basis £or promotion or non-promotion. None ^of these stipulations are 
necessarily undesirable, t^t they do have considerable impact in shaping 
the assessment program. 

The Teacher Training, Evaluation, and Certification Act mandates 
major changes in teacher training and xjertification procedures. The 
requirements ot the legislation include: 

1. all applicants for admission to teacher education programs 
in State supported institutions must successfully complete a 
basic skills examination in reading, writing, and mathematics 

2. the development of an instrument to be used by colleges and 
universities in evaluating all student teachers 

3. development 'of an instrument to be used by local* school 
districts in evaluating teachers during their provisional year 
of certification 

4. successful completion of a teaching area examination as one 
requfrement for provisional certification • ' 

5. discontinuing the use, after July 1, 1981, of the Commons 
Examination of the National Teacher Examinations for the 
purpose of teacher certification 

The legislation includes a number of other provisions, but those cited 
appear to be the ones which impact most significantly on evaluation 
procedures and methodology. 

Not only does policy infldence evaluatioji by determining what is to be 
evaluated and what methodology may be used, it also has impact on local 
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acceptance of programs. Local district personnel frequently perceive 
evaluation requirements as being somewhat arbitrary and infringing upon 
their rights to make decisions locally regarding curriculum and instruction. 
Questions that are of interest to funding agencies (state ''and federal) may 
be of little, if any, interest to local district administrators. For example, ^ 
few district administrators are interested in or concerned with the external 
validity of a project or program which is operating within their school 
district. Whether the project or progra'nj^ works in their particular district 
is of more concern than whether it may be exportable to some otther 
district similarly situated. Evaluation design is, however, determiited by 
"the questions of interest to the funding agency and external validity is of 
interest if the agency is considering replication of the project. Although 
the funding agencies may ask legitimate questions, lack of sensitivity to 
local needs does little to gain the support and cooperation of local district 
personnel who are imiplementing the program. 

All evaluations are not initiated as a direct result of statutes, rules, 
regulations, or other written policy statements whi<*h may require that 
evaluations be conducted. Occasionally, evaluations are requested long 
after a course of action has been determined and a program has been 
implemented. Such evaluations are usually sought as a means of either 
providing data to generate continued support for a decision that was 
initially based primarily on beliefs or for generating support for the 
expansion of the program. This always raises the question of whether what 
is being sought is really an evaluation or a "Good-Housekeeping seal of 
approval." There is no intended implication that those who seek such 
"legitimatizing" evaluations are dishonest or unethical. To the contrary, 
they are almost always sincere, dedicated individuals who firmly believe 
that their program is working, has great merit, and should either be 
maintained or expanded. Seldom have these "stake-holders" entertained 
the possibility prior ta conducting an evaluation, that the results may not 
support their biases. 

Although the questions addressed by such "legitimatizing" evaluations 
are not directly influenced by the policies or actions that established the 
programs being evaluated, the methodology is certainly influenced by their 
ex post facto nature. 

EVALUATION: INFLUENCE ON POUCY 

A common lament of evaluators is that the results of their efforts are 
not used by decision makers. Although this is not always - the case, the 
situation occurs frequently enough to cause great concern among those who 
practice the art (or science) of evaluation. Assuming that decision makers 
are reasonably rational individuals who would prefer to make decisions on 
the basis of information rather than intuition, there must be reasons why 
evaluation findings are not used. Experience indicates several possiblities: 

1. The conclusions of the evaluator are not germane to the 
decisions that must be made. ^ 

2. The evaluation findings are not reported in a manner that 
communicates to the policy makers. ^ 
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3, The findings do not ^support the biases of those making 
policy decisions, 

4. A lack of credibility results either from poorly conceived or 
conducted evaluations or the reporting agency not being 
credible to the potential users of the data. 

Each of these possible reasons for ignoring "the results of evaluation is 
expanded to some degree in the remainder of this section. 

Conclusions Not Germane 

Educational evaluators tend to function in the manner that the label 
implies, i.e., given freedom in designing evaluations they tend to focus only 
on the educational and cost aspects of program or policy and usually 
ignore, or fail to realize, that there are often social and political, as well 
as educational and - financial issues involved. The evaluation of any 
program* or policy may well be of interest to more than one group of 
decision makers and these various groups may legitimately be interested in 
answers to different questiojis. For example, the local district 
administrator of -a federally funded program may be interested only in 
whether the program produces gains in student achievement beyond that 
which would be predicted without program intervention. The state level 
administrator's interest may extend to the question of exte'rnal validity. 
Administrators at the federal level may share the interests of state and 
local administrators, but in addition they want data that can be aggregated 
across states, whereas, the funding body. Congress, may be more interested 
in the social and political, aspects of the program, e,g,, is the group for 
which the program was intended actually being served? Compensatory 
education programs are a case in point. There is serious doubt that 
Congress- will ignore social and political issues and ma*ke significant 
changes in any compensatory education programs solely on the basis of 
pupil achievement data as a measure of program effectiveness. 

Evaluations which are too narrowly focused yield results which may be 
viewed as not germane to the issue to be decided. An example is available 
as a result of an action by the South Carolina General Assembly which 
provided funding for a pilot program to reduce pupil/teacher ratio in the 
first grade from 26:1 to 20:1, The Office. of Research in the SCDE was 
assigned the responsibility for evaluating the pilot program. Since the 
intent of the pilot program . was "obviously" to determine the 
cost-effectiveness of a reduction in pupil/teacher ratio as a means of 
increasing student achievement, an evaluation focused on achievement gain 
was designed and <!onducted over ^a two-year period. The results of the 
evaluation were consistent with a large body of the literature on^ the 
subject and indicated that a reduction in pupil/teacher ratio of the 
magnitude involved was not a cost-effective means of increasing student 
achievement,^ Shortly after the evaluation results were released, the 
General Assembly enacted the Education Finance Act of 1977 and 
mandated a reduction of pupil/teacher ratio from 26:1 to 20:1 in grades one 
through three. Why? In retrospect, the evaluation was too narrowly 
focused and failed to take into consideration' the political aspects of the 
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issue. The South Carolina Education Association, a rather effective 
lobbying group, was exerting pressure on the legislature; and as a means of 
gaining their support for the total finance bill, the proviso relating to 
pupil/teacher ratio was included. The question addressed in the evaluation 
certainly seems to be a legitimate one but perhaps the utility of the study 
could h^ve been improved by including a survey of teachers to determine 
whether they preferred to have the available funds used to reduce 
pupii/teacher ratio or to provide an, increase in salary. Additionally, 
taxpayers, especially those who are parents of school-age children, may be 
supportive of smaller classes because of their beliefs that children receive 
more individual attention in smaller classes. The point is, there were others 
questions, in addition to those related to gain scores, that probably would 
have been of interest to the policy-makers and, consequently, should have 
been included in the evaluation. ^ ' 

If we are really interested in increasing the use of evaluation findings 
in determining policy, more attention must be given to identifying ^the 
social 'and political, along with the educational and economic, issues 
involved. No one is so naive as to believe that all decisions affecting 
education are made by educators; however, the narrow focus of many 
evaluation designs is not consistent with our knowledge' of the decision 
making process. The answer to the question, "By whom will the results be 
used?" is too often left until the final report is beings written, in which 
case, a likely answer is "no one". If evaluations are to yield policy relevant 
results, the various stake-holders and the questions, of interest to them 
must be identified before the fact. 

Evaluation Reports That Do Not Communicate 

The policy maker who has the technical background requisite^ to. the 
interpretation of the typical evaluation report is"* about as rare a specimen 
-as a 5-^foot 2-inch professional basketball player. All too frequently, 
evaluators appear to be more interested in producing reports for journals or 
discussions with their colleagues than in communicating results to potential 
users of the data. The development of highly technical repor^may work 
wonder^ for the evaluator's ego but the intended users may be left with two 
choices: (1) ignore the report, or (2) request assistance in interpreting it. 
A colleague once made a learned, but incomprehensible^-presentation'to a' 
lay advisory committee. Following his one-hour monologue, during which 
the* committee rr.cnnbers were attentive and polite, two committee 
members made comments. The first, a newspaper editor, said, "I consider 
myself to oe a reasonably intelligent man, but for the past hour you have 
insulted my intelligence oy subjecting me to your jargon." The second, a 
wealthy rancher, commented, "As a former school board member, I made it 
a practice never to fund anything that I could not understand." 
Unfortunately, the message the committee members were trying to convey 
was lost on the speaker. The point is, that as long as we evaluators focus 
our communication efforts primarily on each other and the academic^ 
community," we have little room for complaint when the results of our 
efforts are not used by decision makers. 

The Office of Research in the SCDE invested considerable time and 
effort in determining correlates of achievement test scores in order to 
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generate predicted school mean achievement 'scores, for comparison with 
those obtained through the Statewide Testing Program. The intended use 
was to stimuliate local school administrators to take a closer look at 
schools in which there were significant differences between the predicted 
and obtained scores. The Office prepared and disseminated to district 
superintendents a "non-technical" summary report on the project. One 
recipient of the report wrote to the State Superintendent of Education and 
commented that if he was really expected to understand the report, it was 
probably a good thing that he had only a short time left prior to 
retirement. For those of U(S involved in preparing the report, the easiest 
thing to do was to agree that it was probably good that his time of 
retirement was near. In reality, howeve^, he was probably not the only one 
of the ninety-two superintendents with whom we failed to communicate, 
The reports have since been revised and are now being used as initially 
intended in many of-the South Carolina school districts. Unfortunately, we 
do not know the extent to which the potential utility of the information 
Was reduced by our initial failure to commuhicate. " 

Results That Do Not Support Stake-Holder Bias 

c 

Precedent to a decision to evaluate the impact of any program or 
policy is the decision that created the program or formulated the policy. 
This decision usually reflects some belief of the policy makers that the 
course of action being pursued is appropriate and therefore should yie?-^ 
beneficial results. In essence, the st&ke-hdlders have evaluated, with 
positive results, the course of action before pursuing it. Consequently 
when formal evaluations resfjlt in^ findings contrary^^to^^the policy makers* 
bias, the evaluator, and his/her findings, may, encounter considerable 
resistance. The ego involvement of a policy maker may be so .great that 
either revising an existing policy or formulating a new policy contrary to 
the previously selected position represents a cost which is too great to be 
paid. In this case the policy maker may simply disregard the evaluation 
results and proceed on the previously established course. On the other 
hand, the initial rejection of the findings may, with time, give way to' a 
change in position. I have observed this shift occur' so gradually that it v/as 
virtually impossible to determine the 'point at which the change was made. 
(This seems to have implications for studies of evaluation utilization. In 
what time-frame must the results be used in order to consider them 
"utilized"?) 

For several years the SCDE supported the implementation of 
extended-day kindergarten programs in the Relief that such programs were 
efficacious in developing "readiness" for first grade: When the results of a 
third-party evaluation indicated non-sigaificant differences between 
half-day and extended-day programs in this regard, the findings met with 
considerable resistance and unjustifiable questions were raised about the 
technical quality of the study. Gradually, however, the program 
administrators have shifted their position on the issue. The point to be 
made is that it may be 'unreasonable to expect negative evaluation findings 
\o be immediately and warmly embraced by those who are stakerholders in 
the program. ^Perhaps the best that caa be hoped for is thaJt when decision 
makers are given both information and time, reason will prevail. 
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tack of Credibility ; 

As stated earlier, non-utilization of evaluation findings due to lack of 
credibiliy may possibly be attributed to one of two reasons: (1) the 
evaluation was poorly conceived and/or oonductedi or (2) the agency 
reporting the results lacks credibility with the potential users of the data. 
The only apparent defense against the first of these is to conduct other 
evaluations and not make the same mistake twice. Unless, however, the 
services of a different evaluator ^are utilized, the results may still be 
viewed with suspicion by the target audiepce. 

In the second* instance (lack of cr^ibility) there, again, is little that 
can be done after the fact. Prior to engaging in any evaluation effort, the 
agency should carefully consider whether the potential users of the data 
are likely to consider the results to be objective and unbiased. When the 
evaluation funcjtion is centralized within a department of education and, 
consequently, evaluators are not ai^wjerable to program administrators, 
evaluation results may have credibility within the agency itself. On the 
other hand, when viewed externally, even a centralized evaluation function 
is still a part of the total agency and results ^itiay be suspect. Unless tjje 
results of . the evaluation are solely for internal decision making, ""^h is 
p;?obably argues against, having any state department of education evaluate 
programs which it administers, " ^ / ' ^ ^ 

, , ...... 

OTHER FACTORS INFLUENCING POUCY AND EVALUATION . 



Although the influence which special interest groups have upon the 
formulation of policy and, consequently, upon evaluation .has been 
mentioned earlier, the magnitude of\this influence is such that it^ seems to 
warrant some special atteSjntion. Policy is not formulated in a vacuum. 
Policy is formulated only irl response to external influence or, more to the 
point, in response to somte real or imagined need. In my opinion, any 
discussion of policy/evaluation interaction which does not consider those 
external influences ignores the force which drives the syst&m. Regardless 
of whether a policy is devkoped by the Congress, the State Legislature, 
the State Board of Education, or the State Superintendent, it appears that 
the policy is always in respdnse to the stated or inferred needs or <lesires of 
some group. ^ For example^ members of the legislature, as. Ally; fleeted 
representatives of the public, enact statutes (educational and otherwise) 
which, at least in theory, reflect the desires and interests of the* electors. 
The State Board of Education then translates the education-related 
statutes • into rules and regulations which are administered by the State 
Superintendent of Education, In turn, local boards of education may 
develop their own policies dpigned to implement those imposed upon- them. 
Any of the policy-making groups, the legislature, the State- Board pf 
Education, or local school boards may generate policy in response to theic 
awn constituents , without being motivated to do so by one of the higher 
level' policy-making bodies. In any case, however, the link between 
external influences and poli(!y seems obvious, # ' 

This translates, then, to a system of interaction tiiat might be 
portrayed as in Figure 2,, The implications are that special interests 
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directly influence policy and, indirectly, evaluation; policy directly 
influences evaluation; iwid evaluation may ififluence policy either directly 
by feedback to the policy-makers or indirectly through feedback to special 
interests.' • - 




J Figure 2, 

Unfortunately, the process is not nearly so straightforward as the 
preceding paragraphs and Figure 2 may imply. There seem to be at least 
two sources of "noise" in the system that have implications for evaluation 
and the. use of evaluation findings. First, there, are usually competing 
interests at work any, time- that policy (law, rule, or regulation) is 
established. Consequently, compromise may be made in framing the policy 
with tnS^ result that no one is completely, satisfied with the outcome. If 
this is trucj various "stake-holders" may seek quite different results from^ 
evaluations or at l<east may have different ^iases toward those results. 
Second, even if the policy is a precise statement of the expressed desires 
of the interest group, the undeplyihg intent may not be reflected in the 
policy. This may lead evalu&tors down the primrose path to asking the 
wrong questions (or. at least not enough questions) if they mistakenly 
assume that a policy is addressing an educational issue when, in fact, the 
force behind the policy was more of a social, political, or^conomic 
nature. The earlier reference io the question of reducing pupil7teacher 
ratio provides an example ^of the pitfalls which can be encountered as. a 
result of either ignoring or failing to recognize the motivation for a 

particular policy. ' 

^ In concluding, it seems desirable to provide an example of the special 
interest/policy/evaluatioh interiaction. - In using the South Carolina Basic 
Skills Assessment Act, of 1978 as an example, I am aware that my 
perceptions constitute only one versi on of the "truth." ^ , 

The' legislative- -history ^f~The~^ct began in 1977 when two 
representatives, both members of the Black Caucus, co-sponsorefl a bill 
calling for the implementation of a program of grade-by-grade pfemotion 
based on achievement test scores. The bill provided that the State 
Department of Education would select the tests and determine the required 
scores. The proposed program was to be implemented over a twelve-year 
period beginning , with grade UMie^and/ adding a grade each year uptil all 
grades, were included. The bill did not rec0^:ct:favdral^»^a6tion 'in the 
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General Assembly but it did generate sufficient interest to result in a joint 
resolution creating a special committee to study the issue of minimum 
competency in the basic skills. ' ^ c 

In its report to the General Assembly, the committee included a draft 
of proposed leg;islation.5 Although over eighty amendments were 
introduced during the passage of the bill in 1978, the law is essentially the 
same as the bill proposed by the committee. There is, however, ope major 
difference. The committee's proposal that an eleventh grade test of "Adult 
Functional Competency" be required for a high school diploma was deleted 
prior XI passage of the act. What began in 1977 as a move^,^to require 
grade-by-grade promotion based on testing, culminated in 1978 with a 
required testing program to be used for the diagnosis of student 
deficiencies and as a basis for providing basic instruction to assist students 
in overcoming those deficiencies. -Major compromises, the result of the 
interests of various groups, were reached along the way. 

What motivated the introduction.of the original bill in 1977? The most 
obvious ansvyer is that it was simply an extension of the national trend 
towards competency testing. The prime sponsor of the bill says that such 
was not the case. According to the sponsor there v/ere two factors which 
prompted him to introduce the legislation. First, his constituents were 
concerned that many of their children were only semi-literate upon 
graduation from high school and they were seeking a remedy for that 
situation. Second, the legislator had access to data collected through the 
SCDE-operated Statewide Testing Program which indicated that, on the 
average, the achievement scores of-* minority students were considerably 
below those of the majority. He attributed these differences to poor 
teaching and social promotion.' He apparently perceived promotion as a 
function of- testing to be a remedy for the situation. 

The Basic Skills Assessment legislation is a rather significant policy 
statement concerning evaluation. Apparently it came into being as a result 
of the influence of special interests and the use of available evaluation 
data that supported the concern of the interest group. 

Conclusiop 

. This chapter was written for the purpose of providing a non-academic 
perspective of the policy/evaluation interaction in a state education 
agency. While I am confident (at the 99 percent level) that it is 
non-academic, I am not equally confident that it provides a "true" picture 
of the interaction which was the subject of the chapter. As stated earlier, 
others may, and probably do, have quite different perceptions of the ways 
in which policy and evaluation interact, or fail to do so, within the SCDE. 
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FOOTNOTES • 

^Code of Laws of South Carolina, 1976, Chapters 3, 5, 20, and 30. 

^The State Superintendent " recently initiated a management review 
of the agency for the purpose of determining whether some organizational 
changes may be desirable in order to more effectively and efficiently 
fulfill the responsibilities of leadership, service, and regulation. As a 
result of the study, ^the organizational chart (Figure 1) may be inaccurate 
by the time this chapter appears in print. With that possibility in mind, I 
have attempted to keep my observations specific to the agency (as 
requested) while at the same time keeping the interpretations general. 

^Act No. 187 of the 1979 Acts an.i Joint Resolutions of the S. C. 
General Assembly. 

"^Among the common problems encountered by evaluators are those 
created by insufficient time and financing. If it had been possible to 
conduct a longitudinal study of the ^udents involved in the program, the 
results might have been different. Although the educational system is 
generally concerned with long-term benefits, evaluation" is frequently 
restricted to the examination of immediate outcomes. 

^Report of the Special' Joint Education Committee to Study Minimal 
Competency in Basic Skills, 1977. The Special Joint Education Committee 
to Study Minimal Competency in Basic Skills cheated by Part n. Section 31, 
of Act 219 of 1977. 



CHAPTER 5 

The Wisconsin Experience 

James H. Gold 



Although educational policy and evaluation have been a part of the 
American education system from its inception, the content, form, and 
relationsliip between them have changed throughout the years. Early ^ 
school policy was governed by concentration on the 3 R's, and educfiitional 
evaluation was based on the effectiveness of the individual teacher. In 
contrast, schools today have expanded programs far beyond the 3 R's in an 
effort to pj'ovide a more comprehensive education to greater numbers of 
students. The public has charged .schools with the responsibility of 
addressing, and ameliorating, social problems. Accompanying this 
expansion of responsibility is an increase in public dollars from local, state, 
and federal sources. As costs have risen and resources have become less 
available, funding agencies such as private foundations, state education 
agencies, local education agencies, and the federal government have placed 
increased importance on promoting and funding educational activities that 
encoucage desirable ' student behavior. Thiis, program evaluation has 
become an increasingly important part of general school operations. 

Specifically, the passage of the Elementary and Secondary Education 
.Act of 1963 propelled the evaluation movement by requiring evaluation of 
State Education€|l Agency (SEA) and Local ' Educational Agency (LEA) 
ppograms supported by federal funds. This concept has grown horizontally 
to other federal programs and vertically to state and local programs. 
Subsequently, evaluation and policy making have become an integral part 
of management concepts such as PPBS, management by objectives, and the 
more generic concept of .accountability. Although variations exist in 
evaluation philosophy and , application, ^ common theme is that feedback 
about behavioral changes resulting from program evaluations should be the 
basis for both policy and operational decision making at the program level* 
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The binding of program Evaluation to policy making is based on the 
notion that the scientific process which encompasses the evaluation 
process would solve the problems of educating American youth just as the 
same processes were able to put an American on the moon in the 1970's. 
Thus, the expectations were raised that a one-to-one relationship could be' 
developed between evaluation and policy making. 

Today, over a decade later, evaluators, educato»*s, educational decision 
makers, and other interested parties are faced with the reality that the 
one-to-one anticipated relationship has simply not evolved between policy 
and evaluation. Too often evaluation results are ignored in light of 
political expediency. Consequently, we are faced with the problem of 
improving utilization of evaluation information in policy development. 
This chapter will detail some of the issues and present how* one state, 
Wisconsin, is structured to relate evaluation to policy making. 

CONTEXT AND MODEL FOR EVALUATION AJ^D ROLICY 
Contextual Factors 

Prior to discussing the relationship between educational policy making 
and evaluation, it is essential to understand some of the important 
contextual factors within whicfTpolicy making and evaluation operate in 
education. These factors are not new but are a reminder that the 
educational enterprise is dynamic, conducted by humans who possess the 
strengths and frailties which determine the outcome of all Jiuman 
endeavors. Systems are made up of people who should be aacountable for 
results rather than for the failure of the "system" itself. Thus, the 
following contextual factors are presented„as a framework within which 
most policy making and evaluation take place. 

Contextual Factor 1 . Decisions in education are made by 
.influencing those who have the final decision-making authority 
invested in them by state constitutions and laws. 

Contextual Factor 2 . Education is a political process which is 
strongly influenced by individuals and groups who are affected by 
the decisions. Their vested interest may conflict wKh the 
welfare of others. 

Contextual factor 3 . Education is an enterprise in which people 
with diverse values must agree to live with a singly '. set of 
policies and operations within an educational system. V 

Contextual Factor 4 . Education is not an exacting science in 
which success or failure can be precisely predicted for any 
. particular policy or program. Thus, a single best policy or 
program may faQ to emerge. 

Contextual Factor 5 . Translation of programs and operations 
from policy may bear little resemblence to the intent of the 
original policy. 
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. Contextual Factor 6 . Both individuals and groups are often vying 
for limited resources, which often leads to conflict and 

competition rather than cooperation, 

' <^ 

Contextual Factor 7 . Evaluation conclusions are often 
contradicted and refuted by those desiring other outcomes. They 
may reinterpret data or present contradictory data which , 
supports their viewpoint. 

Because of these contextual factors we must begin with the 
assumption that educational decision-making regarding policy, programs, 
and evaluation is not always rational but is often based on influence and 
compromise. This political decision-making process is inefficient but 
successful in a multi-cultural democratic society in which public education 
is laid open, dissected, studi^c], and gradually reconstituted in an 
evolutionary process. 

Within this context we can begin to fit together the dynamics of policy 
and evaluation, realizing each state is somewhat-different in terms of the 
power structure, values, .,and tradition^ which influence the 
policy/evaluation relationship. The importance of , this' chapter is for the 
reader to gain insight into the dynamics of other states so that he may 
better utilize evaluation results in his policy development process. 

A General Model " - 

Educational policies are broad statements of intent which provide 
organizations with a basis for program design and implementation within a 
given State Education Agency (SEA). They ^provide a description of agency 
direction to those outside the organization. Characteristically, policies 
lack quantification, specific behavioral descriptions, and program 
specifications.- However, they, usually do reflect a desired standard. For 
instance, a policy may be to "promote equal educational opportunities for 
aU children in the state." This statement reflects a standard but does not 
indicate how it will be achieved or how one knows when it ,is achieved. 
These specifics are accomplished through the development of goals, 
objectives, programs, and program evaluation. 

"The process of generating policies in an SEA is complex, and varies 
between SEAs. However, it does appear possible to develop a general 
model that reflects the dynamics of decision-making in^most SEAs. 

Figure 1 is such a model and is designed to be flexible in order to 
accommodate the variations which occur in SEAs. The general flow of the 
model shows that "data;* raises policy issues, resulting in policy adoption, 
related programs, and evaluation. The evaluation results are theo used in 
revising policy, programs and, even the evaluation itself. 

' The model begins with the disclosure of "data," which strongly 
suggests that either current policies be revised or new policies be 
developed. At the very minimum, the "data" raises serious questions as to 
certain unmet needs which must be addressed. "Data", in this model hals 
two forms. First, people express> their concerns based on their own 
experience as parents, educators,' students, employers, and taxpayers. 
Although these data are neither systematically collected nor scientifically 
analyzed, they can have a powerful effect on educational policy if a 



-1 0'i 



ERIC 



104 JAMES H. GOLD 



consensus of opinion is gained and the opinions are expressed in a loud and 
clear fashion.. This type of data can be more powerful in bringing about 
change than ewen the best evaluation studies. 

The second type of "data" is more systematic, consisting of test 
scores, surveys, and research. Sometimes these studies are carried out to 
reinforce or change policy, while at other times their influence on policy is 
accidental. 

At is important to note that in most cases the two forms of "data" are 
used to complement each other. For instance, many people expressed 
concern over students not learning the basic skills. This notion was then 
reinforced by reports of declining test scores and other "hard" data. Thus,^ 
most state departments have developed a stronger and more visi'ble policy 
concerning basic skills. 

It is interesting to note that data has several points of entry into the 
policy development process. The Chief State School Officer (CSSO), 
governor, legislature, state board of education, and SEA staff are all viable 
candidates for influencing policy. Depending on the personal policies of 
each and their relationship to each other, the entry points used are based 
on receptivity to change and the power to change. If policy change is 
desired it is most important to analyze the act6rs and select those who are 
receptive and willing to work for the desired change. 
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Figure 1. General Decision-Making Model 
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Once the data is revealed, policy issues are determined by a number of 
different people. Professional groups, legislators, the governor, CSSO, and 
SEA staff offer various policy alternatives. Each of these groups may 
prepare papers on policy issues, appear at public hearings, or attempt to 
persuade others on an individual basis through rational analysis, emotional 
appeal, or political compromise. The end result is either no policy, or a 
policy which is agreed upon by either the SEA, legislature, governor, or any 
combination of the three. , - 

Policies are then translated into programs through the development of 
goals, objectives, activities, and budgets. The legislature often goes 
beyond policy by determining some program specifications. Such activity 
can create conflict between SEA and legislators. It is at this point that 
lobby groups also work to insure that the fun'ds are allocated and activities 
are designed to meet the needs of their constituency. Thus, these groups 
exert strong influence on both SEAs and legislatures during this phase of 

the process. ' - ^ 4. • • 

State programs are given varying degrees of autonomy in determining 
evaluation procedures. For some states program evaluations are required 
but the design is left up to the SEA. Other programs have the evaluation 
specifications spelled out either with general guidelines or specific 
activities. Regardless of the" form, it appears that states are increasingly 
required to evaluate educational programs. 

Evaluation results can influence virtually every phase of the model. 
People^s perceptions could be changed through the new information, but the 
"data" base changes. The evaluation, could cause a re-evaluation of policy 
or raise new policy issues. Certainly the management, organization, goals, 
objectives, activities, and budget of a program could be affected. Whether 
any of these changes take place depends on the processes developed for 
handling data and the desires of those who control the data. 

FEDERAL INFLUENCES ON EVALUATION POUCY 

Since the passage of the Elementary and Secondary Education Act of 
1963, the federal government has had greater influence on the policies and 
evaluations of state and local education agencies.' The federal government 
of the 1960's began to intervene directly to overcome some of the 
large-scale social problems of the decade. Education was among the many 
social programs affected. The major mechanism of change was the 
injection of massive amounts of funds onto state and local education 
agencies for the purpose of designing and implementing programs that met 

the needs of society. ^ ^ , 4. 

In handing out money for specific programs, the federal government 
began to influence state and local district policies. For instance, 
'^acceptance of Title I funds increased state and local districts' commitment 
to improve the education of the disadvantaged. Although most states had 
some commitment to this policy, the Title I funds increased that 
commitment and made it very visible. I^ikewise, the original Title III 
greatly influenced policies for innovative programs and expanded the policy 
of publicly documenting educational needs at a statewide level. StiU other 
prpgrams were responsible for implementing new management concepts 
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and placing an emphasis on educational planning' on a statewide level. At 
the SEA level, many, if not most, of the current central planning units had 
their origins established from ESEA funding. Thus, the federal 
government, through massive funding attached to specific programs, has 
had a great impact on educational policies at the state and local levels. 

One of the key areais of influence is in evaluation of programs. A 
prime example is Title I, in which evaluations have gone from being locally 
designed to following more rigorous federally mandated requirements. At 
tlje outset, local districts were permitted great leeway in how their 
program evaluations were designed and implemented. The goals and 
objectives were strongly encouraged and the evaluation instruments were 
left up to the local district. As a result, evaluations ranged from excellent 
to totally inadequate. As the inconsistency in quality became more 
evident, the federal guidelines for, Title I evaluations were restricted to 
make them more consistent with good evaluation practice. 

In addition. Congress began to question seriously the effectiveness- of 
Title I funds. This questioning led to changes in evaluation requirements 
and subseq ently to evaliiation practices. When faced with measuring the 
impact of Title I, educators were unable to aggregate Title I evaluation 
data from across the nation and gauge their effectiveness. Instead, case 
studies and anecdotal data were used to defend or attack the massive 
expenditure of funds. Subsequently, Congress mandated that a method be 
developed to report on the impact of Title I to Congress. 

As a consequence. Title I developed four models for evaluation which 
generated data that could be aggregated at the state and federal levels. 
This strategy limited the evaluation instruments that were required (LEAs 
could supplement), the sequence of evaluation events, and, to a certain 
degree, the content required to be evaluated. In essence, the Title I 
requirements limit the required evaluation strategies, curricular content, 
and test instruments to those that it specifies. 

^ LOCAL INFLUENCE ON POUCY DEVELOPMENT 

Local individuals and groups can influence policy decision-making 
through several mechanisms. Lacking definitive data on how effectively 
these mechanisms influence policy, the reader must draw upon his own 
experiences to judge the value of each. 

Individufid Personal Contact 

In this case the individuals may caU, write, or meet with SEA 
management, legisl^ttors, or the governor and express^ their opinions. For 
people viewed by government as opinion leaders, this t^'pe of contact is 
valuable. Otherwise, a large quantity of responses is needed to influence 
policy decisions. ^ * 

Specific Issue Groups 

There are .ad hoc groups which pool energy and resources to change 
specific policies or procedures. Their interest is in a single topic and 



' . WISCONSIN 107- 



they use personal contacts,* letter campaigns, and media as their major 
modes of operation. 

Task Forces and Advisory Groups 

Historically, the governor, legislature, and SEA have established task 
forces, to review specific problems and make recommendations for policy 
and programs. These task forces are usually appointed by the governor 
and/or agency head, with the basis of selection not always clear. In most 
cases, the local constituency is represented. However, the individuals 
selected often hold views similar, or at least not incompatable with, the 
appointing authority. Thus many, but .not all, task forces have a built-in 
bias. r, , 

Organizations 

The local constituency ,is usually well represented by various 
professional organizations who have been increasingly involved in political 
lobbying efforts to serve the needs of their members. Teachers, 
administrators and busin(5ss officials all have their representatives who 
monitor and influence educational policy and programs at the state level. 
In Wisconsin, some of these groups have formed an umbrella group with the 
SEA for the purpose of discussing major ^policy "and program 
considerations. Although the group has no formal authority, an 
overwhelming consensus on an issue, policy, or program would have a great 
influence on those who made decisions. - 

In Wisconsin local influence has had a"^ general effect of maintainirjg 
local control where the federal or state laws have not compromised it. In 
evaluation, most advisory groups opt for leaving the design and 
implementation up to LEAs and requiring as little extra work as possible to 
accomplish evaluation requirements. This position is partly due to the 
issue of control, but may very well reflect a feeling on the part of LEAs 
that evaluation data lither are not, or cannot be, utilized enough to justify 
increased demands on the time, energy,"'and money of the local district's 
staff. 

EFFECT OF STATE POUCIE^ ON EVALUATION 

State educational policies affect evaluation in three ways. First, the 
policies may determine that no evaluation take place. This is usually 
accomplished by leaving the requirement for evaluation out of legislation 
and the budget. Thus, the programs ffre implemented and a general fiscal 
accounting is done, but no performance evaluation takes place. 

Second, legislation and/or budget^ documentation may be very 
prescriptive in determining the, evaluation policy and procedures. In such 
cases the evaluation requirements are often spelled, out in detail 
regarding: (1) process, (2) instruments, (3) time-lines, and (4) reporting 
requirements. This situation places severe limitations on the SEA, but 
" increases the probability that the legislature will have its evaluation policy 
carried out. 
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Third, the legislature/budget requires that the SEA evaluate specific 
progiams that are beirtg supported by state funds. The directive ' to 
evaluate is often vague, sometimes ambiguous, and always open to 
, interpretation as to legislative intent. Procedurally, the CSSO assigns the 
program to an individual to administer. The nature of the program and 
evaluation will then be determined by the personality, politics, professional 
persuasion, and program priorities of the prograrr^ director and his or her 
superiors who have veto powers. 

Like individuals, organizational units have their own personalities, 
politics, and program priorities. Thus, central evaluation units tend to be 
interiested more in performance-lpased evaluations, using surveys and 
pbjective tests, while instructional people often emphasize process 
evaluation, using interview techniques or other methods which place less 
dependence on student performance, The^ nature of . the evaluation, then, 
^ b6 established by organizational assignments since an overall agency 

evaluation policy is often absent, 

INFLUENCE OF EVALUATION ON POUCY 

Educational evaluation his the potential for influencing four aspects 
of the educational enterprise. First is the establishment of programs based 
on evaluation of need. Once concerns are expressed as shown in the earlier 
decision model, the collection of systematic evaluation data may* indicate 
the 'degree to which the concern is real. This use of evaluation could set 
the course for the content, process, and extensiveness of SEA programs, 
Th^ results of these evaluations can directly affect the amount of fiscal 
and human resources made available to address the concerns. 

Second, the management and operations of ongoing programs may be 
modified as a result of formative evaluation. These changes usually affect 
the activities of staff and students, but avoid any major changes in overall 
policy, missions, goals, or objectives. These changes are intended to 
promote more effective and efficient attainment of the original goals and 
objectives, 

A third aspect of evaluation is program monitoring, which is related to 
management and operations, but has the intent of insuring that the 
proposed program is the program being, carried out. Program changes must 
be documented, verified, and justified. 

The fourth area ir..olves policy changes Which determine the 
continuation of the program. Programs may be discarded because the 
evaluation shows them to be ineffective, inefficient, or politically 
unnecessary, Evaluatioris may indicate* that the goals and objectives are 
both reasonable and based on desired standards. However, the end results 
simply may not meet* the standards sufficiently to warrant continuation of 
the [Project, It is also possible that, even if the goals and' -objectives are 
being met, the cost in dollars and human commitment is too great for the 
• outcome. Finally, when the evaluation results are placed in the larger 
context of total organizational programming, other pHorities may 
supercede the project as a result of changing needs anci perceptions on the 
part of administrators and the public. Thus, the evaluation may contribute 
to the expansion, maintenance, or termination of existing programs. 
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, The final policy decision, /involves the choosing of one program 
approach over ' another, and the /generalylsignificance of . the evaluation 
results. As a res.ult of a program eValuaition, an organization may choose 
to drop one program and adopt and/or expand a program that the 
evaluation has shown to be more ^ectiye, efficient, and/or politically 
acceptable. 

In reality, evaluation studies have a relatively small impact on policies 
in comparison to 'their impact on program operations. This situation exists 
because of the nature of educational policy, the politics of the educational 
enterprise, and the current state of the science of evaluation. 

As indicated earlier, educational policy is usually stated in br©ad and 
abstract terms whicri foster multiple interpretations of goals, objectives, 
programs and evaluation. Since most evaluations are designed to measure 
program objectives and activi^es, it is littlc^onder that policy is barely 
touched. In addition, most policies are robust enough to^ withstand 
significant program changes without requirfng policy changes. 

Statewide educational policy is usually very appealing to the public 
and appears to be in the same unassailable category as "Chevrolet, apple 
pie, and the American flag." Within this context much public policy is 
traditional and insulated from rapid and extensive change. This stability is 
due to the " balance of power between traditionalists, moder^ites, and 
. lib^als who influence and make policy decisions. 

This balance of power also explains, in part, the imposing role that the 
federal government and courts havd played in bringing about both policy 
and program changes at the state an1 local level. During the last 25 years 
the government and courts have been liberal regarding social policy and 
programs, and have created policy and program changes at the state and 
local level by legal mandates and by the infusion of large grants for 
educational programs. For example, statewide policies concerning equal 
educational opportunity, school desegregation, school- finance, and library 
building ^programs were all changed clramatically because '^of federal 
intervention. Even curriculum policies and programs have been. influenced 
dramatically by the creation of the National Defense Education Act 
.(NDEA), Title III, and Title IV. It is important to note that the federal 
activity came about as a result of public concern and a national feeling 
that »the policies and programs were necessary for* the public^ good, not 
because a , comprehensive evaluation concluded the changes were 
imperative. Society expressed concerns and the federal government and 
.courts responded by establishing programs and poli?!ies that addressed them. 

Evaluation has had a less than desirable effec^t on policy because 
evaluation results 6ire often inconclusive or contrary to previous studies. 
The technology of evaluation is not perfect and (Contains the \yir:^ of o^Xh 
the evaluator and the program staff which focuses on specific aspects of 
the program while minimizing and/or ignoring others. Even comprehensive 
evaluations have errors of both content and decign which allow opponents 
to criticize the evaluation and discredit the results on technical grounds. 

Similarly, educational evaluation has not produced insightful, 
permanent, and significant discoveries which would revolutionize the 
enterprise as is true in other fields. Evaluation discoveries have no 
analogies to X-rays, peni.?Lllin, or the electric light bulb. Instead, 
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education has a'^ series of fads such as management by objectives, 
programmed instruction; and' the open school, which lose their luster after 
a relatively short period of time or are refuted by contrary research within 
a decade. As a result, educators and evaluators have failed ^to create the 
public trust which other fields have developed through finding pernjanent 
and effective ways to address public concerns. 

Finally, the traditional view of organizations having specific missions 
towards which all their human and fiscal resources are devoted is no longer 
appropriate. Factors such as limited resources, controversial policy issues,, 
influence pf special interest groups, public politics, organization politics, 
and the openness of the democratic process have led to'decisiong that are 
less than optimal in regard to the ^ organization's mission, but more 
practical in that they try .to satisfy all the variables listed above so that 
programs and policies can b^ 'implemented. Thus, one should be little 
surprised when parents 'and others express dismay over their perception 
that the children seem to have been lost in the decision-making process, 
while the survival of individuals and organizations appear to have been 
optimized. Although evaluators should be concerned about students, in 
reality, evaluation results may. be neither necessary nor effective when one 
or more of the above facto.rs ik given higher or even exclusive priority in 
decision making. ' ' \ 

STRUCTURE AND FUNCTIONS OF POUCY AND EVALUATION 
WITHIN THE WISCONSIN DEPARTMENT OF PUBLIC INSTRUCTION 

Department Organization - 

The Wisconsin Department of Public Instruction (DPI) is unique in that" 
the state superintendent is both the major policy m,aker and administrator 
for education^ in -the state. The CSSO is a constitutional officer elected in 
a populaiN non-partisan election every four y6ars. Ther^ is no state board 
of education or any other state structure which supercedes the 
policy-making authority of the office. Thus, the CSSO is accountable only 
to the public every four yearsi. 

Implementation of policies and operations are accomplished in two 
ways. First, Wisconsin is under a biennial budget system with a budget 
review occur ing on the off year.* The budget is a mix of fiscal and program 
elements. However, the budget has been used, increasingly by state 
agencies, legislature, and the governor- for developing or changing policy. 

The DPI creates a budget which is submitted to the Djepartment of 
Administration for review, after which the governor makes 
recommendations to the legislature." Both budgets are reviewed and 
modified by committees and finally adopted as a total state budget. The 
governor does have line veto powers which can be overturned by a 
two~thirds vote pf the legislature. This process generally " follows the' 
model described in the previous section and is greatly influenced by various 
special interest groups. 

Organization of r»?I, shown in Figure 2, consists of the state 
superintendent, an appoint^ed deputy, and fiva appointed assistant 
superintendents, who serve at the pleasure of the CSSO. The department is 
divided into five, major divisions including Financial Aids, Handicapped 
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Children Instruction, Library, and Management, Planning and Federal 
Services. 

An Administrative Council made up of the CSSO, deputy CSSO, and 
five assistant superintendents review majoi- policy changes. In addition, 
the CSSO confers with individual assistant superintendents, program staff, 
and numerous task forces and advisory ^groups Tor direction concerning 
policy and operations. However, final decisions regarding policy and 
implementation are the sole responsibility of the CSSO. 
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Evaluation Structure and Functions 

Housed within the Division for Management, Information, and federal 
Services is the Bureau for Evaluation, Planning, Information, and 
Research, This is considered the central evaluation unit of DPI and is 
divided into three units, two of which are involved directly in program 
evaluation. The unit entitled Educational Planning, Evaluation, and 
Research consists of six persons who have varied responsibilities. Some 
design and implement evaluations for other DPI programs such as special 
education and nutritional education. Others review and monitor local 
evaluations for Title IVC projects, Somg have been involved in a statewide 
needs assessment which had potential for statewide policy development. 
For the most part, the group is "on loan" to provide services to others who 
lack staff to fulfill their evaluation needs. This group also advises local 
districts on the design and implementation of local evaluation programs. 

The second unit is the State Assessment Program, which' conducts an 
annual statewide assessment of pupil performance. This unit has a staff of 
six. In addition to statewide testing, this unit has been instrumental in 
- developing a local option testing program and is in the beginning stages of 
developing an item bank to be used by LEA's. 

Evaluation Methodologies 

The basic methodology used in evaluation is that of establishing 
outcome objectives and measuring the attainment of those objectives for 
each program evaluation. In addition, some process evaluation may take 
place to insure that the program proposed was in fact the program 
evaluated. Achievement tests, interviews, and questionnaires have all been 
employed as data-gathering tools. 

Problems and Constraints 

Lack of Clarity Concerning What Clients Want from Evaluation , 
Many clients come to evaluation without much knowledge of 
what questions they want the evaluation to address. 
Subsequently, a great deal of time must be spent on clarifying 
goals and objectives. To some clients this is both a tedious and 
often threatening task. 

Attitude. Some clients are afraid of evaluators because they are 
intimidated by them and/or feel they are being personally 
evaluated. Such feelings hinder the evaluation process in that 
people either become resistant to the process or agree to things 
they later reject under the pretense that they did not understand 
them. 

Minimal Effort. Clients are often being forced into evaluation 
to hold on to their funding, and, therefore, desire to» do the 
minimum to meet the requirements. They are just going through 
the rpotions. 
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Communieations. Evaluator and clients often fail to communi- 
cate ideas and assumptions, thus a common understanding does 
not exist. Such communication gaps are caused by both language 
differences and varying degrees of receptivity to ideas and - 
viewpoints on the part of both the evaluator and client. 

Use of Evaluation Results. After the time and money are 
invested, after fulfilling an obligation to evaluate,' clients do not 
utilize results adequately. There is resistance to pre-planning 
the use of results and a propensity for a "wait and see" attitude. 
However, most evaluators do believe that results are not used in 
a systematic and visible fashion. 

Difference in Philosophy . Differences in evaluation philosophy 
exist among educators, especially in terms of the "hardness" or 
"softness" of data required for adequate evaluation. In the 
Wisconsin DPI, as in most agencies, there are those who place 
greater emphasis on process rather than outcomes. Thus, 
conflict arises as to how .much reliance should be placed on 
student performance data versus other types of information. If 
an evaluator who is a strong believer in performance data is 
"loaned" to a program whose personnel tends to believe more in 
^ process or- other data, the evaluator is constrained in providing 
services as he or she believes is appropriate. 

PUPIL ASSESSMENT: A CASE HISTORY 

In 1971 the Wisconsin legislature, with the support of the DPI, enacted 
S.115. 28(10), which mandated the department establish a pupil assessment 
program within very broad guidelines: 

Develop an educational program to measure objectively the 
• adequacy and efficiency of educational programs offered by 
public schools in this state . . . Assessment shall be undertaken 
at several grade levels on a uniform statewide basis. 
(SD.115.28(10) 

UnUke other states, this legislation passed with a minimum of debate 
and little organized opposition. It received neither wide press coverage nor 
special interest group attention. Most importantly, the legislature did not 
provide any state funds for initiating the program, which may account for 
the lack of interest among law-makers and educators regarding the passage 
of the bill. Thus, the DPI was mandated to provide a grogram of pupil 
assessment, but was given little legislative guidance, interest, or funding. 

Fortunately, the state superintendent was committed to the concept 
of accountability and therefore allocated discretionary federal funds for 
starting the program with the intent that the state would eventually take 
- over its support. This became a reality in the following biennium, when the 
state provided fuU support of the program. The state has increased 
allocations for the program each biennium since. 
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From 1971-75 the assessment program developed a set of goals for 
education, and implemented assessments in reading, math, science, and 
social studies. The Eleven GoalS; for Education were intended to be the 
basiS'' of the assessment and were to provide direction for education in 
Wisconsin Public Schools. The test instruments were all objectively 
referenced, and put together by Wisconsin educators, or selected from the 
National Assessment of Education Progress (NAEP). They were 
administered to a random sample of public school students based on a 
two-stage random sample design. 

During the first four years, several events did and did not take place. 
No definitive purpose for the assessment was established or documented. 
There was little consideration given to the nature and type of data to be 
collected. There was little consideration of persons responsible for content 
and technical quality of the instruments. No one was designated 
responsible for the content of the reports. Thus, an internal tug-of-war 
began between assessment personnel, the specialist, and management 
personnel pi other divisions. As a result, until 1976, products were deemed 
inadequate by the assessment director. Many policy makers, outside of the 
DPI, questioned the value "of the assessment data. 

At the end of the 1975 assessment, an internal evaluation of the 
assessment program was done by the assessment staff and a Technical 
Advisory Committee. The evaluation concluded: the sampling procedures 
^A^ere exceUent, the logistical systems for administering the program were 
excellent, the test instruments needed some refinement, and the program 
did not seem to meet the need for student information. Thus, an 
assessment program had been created that operated well, but did not 
satisfy needs of educators, lawmakers, or the general citizenry of 
Wisconsin. 

As a consequence, the CSSO directed the assessment staff to 
accomplish the following: 

1. provide the citizenry of Wisconsin with a statewide profile 
of the quality of education as reflected in students' ability 
to demonstrate expected knowledge, skills, and attitudes 

2. provide state officials with student performance 
information for use in educational policy development 
and/or communicating with constituents 

' ^ 3. provide state officials and the general citizenry with a 
profile of Wisonsin pupils* perform'ance as compared to a 
national average 

4. provide school districts with the opportunity for 
self-evaluation, using the methodology and products of the 
^ Wisconsin Pupil Assessment Program 

In addition, the CSSO directed that public involvement be signficantly 
increased in all phases of the assessment. This change resulted in the 
assessment of practical skills and knowledge in addition to the purely 
academic objectives the assessment had previously focused on. ^ 
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The changes in policy and procedure were consistent with the general 
policy of the CSSO regarding local control. The assessment was to be 
developed for, and by, the public with technical assistance provided to 
local districts oh a volunteer basis. The assessment design was a "safe" 
compromise which showed DPI was concerned about accountability and yet 
was not demanding enough to pose a direct threat to any particular special 
interest group. In summary, the assessment was a reactive program 



designed to quell concern^ over student performance. Its purpose was to 
provide public information and not to comprehensively evaluate specific 
educational programs in Wisconsin. 

Although the yearly results made- interesting newspaper coveragey it is 
difficult to identify whether the assessment has been instrumental in 
changing policy. No new programs have been initiated and no new funds 
have been generated by the assessment results.. There is scant evidence 
that the education of Wisconsin children has improved, or that any 
improvement, which may have occurred, would be due to the assessment. 
On the positive side, there is evidence that local districts using the local 
option, program have used it for program improvement. This effort is 
encouraging, since districts are attempting to find ways to make evaluation 
figure more prominently in policy making. 



-HEURISTIGS-FOR INTEGRATING POUCY AND EVALUATION 

For the purpose of clarity it is necessary to draw a distinction between 
local project evaluations with limited local implications, "and statewide 
evaluations which have potential implications for the entire state 
education system. Policy questions are more easily addressed in projects. 
The appropriateness of a 'decision can be judged against the concerns and 
needs of a relatively small group of people instead of judging whether a 
particular' policy or program is appropriate for 436 LEA*s who have both 
convfflmon and unique needs. Consequently, policy may be more 
controversial at the statewide level than in a single LEA; 

The sheer size of a statewide evaluation and decision-making process 
opens complicated political channels which are often difficult to control. 
The power structure may shift with the policy issue in statewide processes, 
whereas, at the local level the power structure appears to be more stable. « 

Statewide evaluations are usually carried out by SEA stiff, while local 
project evaluations may involve SEA staff, local staff or outside 
consultants. Statewide evaluations require an internal evaluator, whose 
role may be different from an outsider*s. The internal evaluator begins 
with a particular status in the organization and is, in all likelihood, less 
prone to deviate from the organizational norms of communication, 
attitudes, and innovation. This indivv^ual is already part of the 
organizational structure' and, in all probability, has been pigeon-holed into 
specific political and philosophical categories, making it less likely that the 
organizational staff will view the evaluator as unbiased. The SEA 
evaluator must then make a conscious decision to either facilitate thje 
evaluation process by providing technical expertise, or take an active role 
in the politics of the situation,, thereby influencing the design, 
implementation, and utilization of the evaluation. 
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Suggested heuristics for improving the integration of policy and 
evaluation at the project level follow. 

Project Evaluation ^ , 

1, Make sure your credibility with the organization is 
established by providing documentation of work history and 
references from other evaluation projects' you have 
conducted. Submission of an evaluation report which you 
have .completed would also be valuable, 

2, Determine at an early stage both the formal and informal 
structure for decision-making in the organization, and deal 
with the appropriate decision-makers regarding the 
evaluation design and implementation, 

3, All communications should be done at a language and 
technicaLlevel appropriate for the audience. Do not use 
the average level but the lowest level in the audience so 
that you will communicate effectively with all people 
involved in the process, \ 

Be prepared to deal with people on a simplistic level. 
Assume responsibility for increasing the audience's 
knowledge of evaluation techniques^ and of how results can 
be utilized* 

Clarify the roles and responsibilities of the evaluator, \ 
staff, and administration, \ 

Do not go into an evaluation situation with preconceived 
notions of what the evaluation should do or how the 
evaluation will be conducted. Be sensitive to local needs 
since, in the final analysis, the best evaluation design will 
be useless unless it meets local needs. 

Do not take sides on local political issues, but act as a 
conciliator in bringing about compromise. Make 
suggestions and relate relevant research, but do not impose 
your viewpoint. 

Identify the purpose of the evaluation and the specific 
questions that need to be answered. This is essential since 
most clients do not understand what an evaluation can do 
or what they want. Be honest about what questions can 
and cannot be adequately addressed, and explain why. 
State how much staff time and what resources are required . 
for answering questions. Do not let the client expect more 
than the evaluation or evaluator can deliver. Document 
the purpose of the evaluation and the specific questions to 
be addressed and have both the client^and evaluator sign it. 



5, 
6, 

7. 
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9. Develop the evaluation process plan. Include activities, 
responsibilities and timelines. This may be accomplished 
through developing options which the client decides from, 
based on your analysis. This decision should be a team 
process so that the client will feel ownership and t?y to 
tailor the evaluation to the local situation. \) 

10. Develop an analysis plan that specifies how the data will be 
translated into answers to the evaluation questions. Avoid 
statistical jargon. Present the plan ' so 'that everyone 
involved will understand the process and the final outcome. 

11. Develop an interpretation and utilization plan that* 
delineates who, how, and when the results will be 
interpreted." This step is critical in utilization, since most 
clients are inclined to let the results determine the 
utilization of data. By defining how specific desirable and 
andesirable outcomes will affect policy and operations, the 
client will probably be more committed to following 
through with specific actions. 

12. Develop a dissemination plan which targets evaluation 
results and recommendations to audiences in a form that 
they will read and follow through on. Produce technical 
and summary reports which convey the same information in 
different forms. In some cases dissemination may utilize 
alternative media such as transparencies, television or 
slide/tape presentations. 

13. Keep key decision- makers informed as to the progress of 
the evaluations and any unusual findings. Do not drop 
surprises on the client to which he or she is not ready to 
respond. - t, 

14. Make sure the role and responsibility of the evaluator is 
clear in terms of infornniation release. Do not release 
information without the client's approval. Requests for - 
information should be directed to the client. 

Statewide Evaluation and Policy Interaction 

Most statewide evaluations are conducted by the SEA staff, thus, 
.there is little client-evaluator conflict. However, the evaluator must 
interact with other government and ''public entities, finding the most 
resistance in intra-agency dealings. Thus, the heuristics presented below 
assume evaluation and policy interaction are internal agency activities that 
may involve extra-agency politick 

1. It is essential that the evaluator have the full confidence 
of the agency's top management. Since statewide decisions 
^e made by the CSSO and/or state board, it is imperative 
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that the evaluator build ''a strong track-^record with them. 
The evaluator must produce evaluations that fit client 
expectations and are politically astute, flexible, and 
technically competent, 

2. The evaluator must build a positive relationship avith those 
staff members who may be affected by evaluation 
outcomes. Agencies are generally resistant to change. 
Individuals may be directly threatened by evaluation, and 
may do everything in their power to overtly and covertly 

_lmpede_the_e valuation effort.. — . ^. _ 

3. E valuators should build relations with influential groups 
outside the agency and educate them as to how evaluation 
results can be used in making decisions. These influential 
people include budget analysts, legislative staff, 
professional organizational staff, and media people. 
Evaluators can educate such people by holding conferences 
and workshops for them, or by meeting ' with them 
individually to inform them of evaluation progress. Under 
no circumstances should such meetings take place if they 
conflict with an agency policy or rule. Likewise, the 
evaluator should not promote ideas which are contrary to 

^agency policies, rules or reigulations. The intent is to 
educate and build confidence in the evaluation process. 

4. The purpose and objectives of any statewide evaluation 
should be clearly delineated and approved by the CSSO and 
the evaluator. The evaluator should offer alternatives and 
recommendations as to what questions the evaluation may 
address, as well as an analysis of the policy and operational 
implications of each. 

5. An evaluation plan should be developed that includes 
activities, responsibilities, and timelines. This plan is then 

. signed-off by the CSSO and management staff who are 
involved. The signing; of the plan represents a commitment 
to implement it. Management must insure that staff 
members carry out the plan even if it means reduction in 
other staff activities. 

.6. An analysis plan should be developed to specify how the 
data will be translated into a format that will answer the 
questions addressed in the evaluation. This plan provides a 
mechanism for communicating what will and will not be 
done wit^ the data. 

7. A utilization plan should be developed to indicate how the 
evaluation results will be used. Positive and negative 
results should be analyzed and potential actions described. 
Such a plan represents a public commitment to use data in 
specific ways. It may prevent the collection of extraneous 
information that is costly and inconvenient. 
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8. Provide a dissemination plan that directs evaluation results 
and recommendations to target audiences in a form that 
they will read and follow through on. Produce technical 
and summary reports which convey the same information in 
different forms. In sofne cases, dissemination, may utilize 
alternative media such as transparencies, television, or 
slide/tape presentations. 

Have evaluation results reviewed by professional and lay 
people who ^ have different points of view. Provide 
-mechan^isms-for each Interaction with the two objectives 
of, 1) having people with diverse viewpoints gain a better 
understanding of each other and the issues, and 2) obtaining 
new perspectives on the. issues. Such an open dialogue 
promotes common understanding and increased support for 
subsequent actions. 

Although the heuristics described above may aid .utilizing evaluation 
results in policy-making, the key to success is in the personalities and 
politics of individual decision-makers. It is obvious that some 
decision-makers have the confidence and ability to attempt revolutionary 
changes while others are satisfied to let the "system" evolve at its own 
pacer— Regafdl"^^ of any — parttcnlar" Sitaation, the~e valuator must 
understand that evaluation results will always be used in a political 
context. Unless attitudes change, most systems will decide to provide 
sufficient, not optimal resources to an organization in helping it attain its 
goals. Credible evaluation, then, is essential for verifying the need for and 
the effectiveness of programs within the broader context of 
decision-making. 
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A CONCEPTUAL FRAMEWORK 



Policy and Evaluation Defined 

It is not the purpose of this chapter to develop a complete conceptual 
framework for policy and evaluation. Its pyrpose," rather, is to show, in a 
practical setting, how these two' interact. But it is recognized that some 
definition is necessary in order to provide a common understanding which 
will permit communication between author and reader. For the purpose of 
this chapter, then, the following definitions are presented: 



The common factor »in. both definitions is "decision making." In some 
manner, both policy and evaluation are used by a person who is forced to 
make a decision. How they interact will, of course, >gary with' the 
situation. But the fact that they do interact is a major premise here and 
the fact that their interaction occurs in the decision making process is a 
second majpr premise. Therefore, we can only meaningfully define these 
terms dynamically; by considering now they function jointly and severally 
in their mutual environment of decision making. • 



Policy ; 



Guidance provided by an organization to decision 
makers 



Evaluation: 



The collection of 
decisions 



information for use in making 
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The Interaction of Policy and Evaluation 



Policy and evaluation both serve the deoision making process. They 
interact with each other in a variety of ways when decisions are made. To 
illustrate, consider an unhappy consumer v-ho finds that the radio he 
recently purchased does not work. He returns to the fitore seeking to 
return the radio. What possible reactions can he get from the salesperson? 
The salesperson will either take the radio back or not. Hew the decision is 
made depends on how the store uses policy and evaluation. 

Scenario 1 . The^ salesperson says: "It is our policy to accept all (or no) 
returns, no questions asked." Here the decision appears to be based wholly 
on policy. An evaluation of how the radio functions is not required tp make 
the decision. We should, however, suspect that evaluation did have some 
influence on the adoption of a policy to accept all returns. Probably, 
market research or the storeowner^s own informal collection of information 
indicated that such a policy is, in the long run,^good business. 

Scenario 2 . The salesperson says: ^1t is our policy to decide all 
returns based on our evaluation of the product. We will have our service 
department conduct the evaluation, and then we will decide." The decision 
is based oh policy and on information generated through ^ product 
evaluation. That is, a blend of evaluation and policy. 

Scenario 3 . The salesperson says: "We don't have a general policy on 
returns, but the radio is obviously defective, ^o you can return it." Having 
evaluated the radio's function, it appears that the salesperson is able to 
make a decision based only on evaluation and with no policy influence. But 
the absence of policy is illuspry. Implicit in the salesperson's willingness to 
take back a defective product is a policy to be fair, or to please the 
customer, or to avoid legal action by recognizing an implied warranty. 

It appears from the scenarios that no decision is made without the 
influence of both policy and evaluation. Why, then, do our real life 
experiences lead us to think that some decisions can be made without the 
influence of policy; or without the influence of evaluation? The answer 
lies in our perception of the influences of policy and evaluation relative to 
each other on the making of the decision. Our perception of the relative 
influence of each is determined by the proximity of the influence to the 
decision situation. A graphic presentation of this model of the perceived 
relative influences of the making of a decision appears in Figure 1: 
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At Point B, a decision maker would perceive that the decision is 
influenced to a greater degree by policy than by evaluation. Similarly, at 
Point D the decision maker would perceive being influenced to a greater 
degree by evaluation than by policy. At Point C, the perceived influences 
are equal. Points A and E do not exist in reality because every decision is 
influenced by both policy and evaluation. The misleading perceived 
relative influences are determined by the proximity of the influence to the 
ini mediate decision to be made. 

Point B represents Scenario 1. The perceived influence of evaluation 
by the salesman is nonexistent. Actually, the influence of evaluation exists 
but is perceived to be so small- relative to the influence of policy, it 
appears nonexistent when the salesman says: "It is our policy to accept all 
returns." The perception of the influence is determined by the proximity 
of the influence* to the decision situation. The reason it appears to the 
salesman that evaluation is not necessary for his r'^cision to take the radio 
back, is that he is thinking in terms of the eva^ iation of the radio, the 
object to be returned. He does not perceive- the influence of ^ a more 
remote, evaluation: The store owner's market research which led him to 
believe that the general policy of accepting all returns is good business. So 
evaluation influenced the policy directly, but influenced the immediate 
decision about the- radio only--remately ^and. so - it^ appeared to-^^^^ 
maker that evaluation did not influence his decision. Thus, it often appears 
to us that policy (or evaluation) does not enter into the decisions we make. 
However, if we look hard enough we find the remote influence of 
evaluation (or policy) in every decision we make. . The model, just 
described, can* be summarized by ten "rules": 

1. The interaction of policy and evaluation occurs during the 
decision making process. 

2. The nature of this interaction is determined by the relative 
influence each has on the making of the decision. 

3. The perceived ^ influence of policy and evaluation on the 
making of a decision is determined by the proximity of the 
influence to the decision making situation. 

4. A decision can be made with the direct influence of both 
policy and evaluation in the decision situation (Scenario 2). 

5. A decision can be ma<ie with the direct influence of policy 

in the decision situation and wjth no direct influence of , 

evaluation. The influence of evaluation on the decision 

exists but is remote to the decision situation. A total lack 
of evaluation influence is illusory (Scenario 1). 

6. Similarly, a decision can be made with the direct influence 
of evaluation in the decision situation and with no direct 
influence of policy. The influence of policy on the decision 
exists but is remote to the decision situation. A total lack 
of policy influence is illusory (Scenario 3). 
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7, Policy is ..created or affirmed each time a decision is 
made. It is common when making a decision to look at 
earlier decisions made under a similar set of 
circumstances. If the later decision , is the same as the 
earlier decision, the policy which influenced the earlier 
decision is affirmed. If the later decision rejected the 
influence of the policy applied in the earlier decision, new 
policy may be gleaned from the new decision. Court 
decisions function in this way (cf, Caulley and Dowdy, 
1979), 

8, Evaluation information is unaffected by a decision it 
influences. The weight and credibility of the data in future 
use, however, may be affected by how the decision maker 
allowed it to influence the decision, 

' 9. Policy has no influence on decision making unless the 
policy is communicated and known to the decision maker. 

JLQ^ S.umIarJy.,._ej{kluat^^^ 

decision unless the • information is communicated and is 
^ known to (and understood by) the decision maker. 



POLICY AND EVALUATION IN THE 
OREGON DEPARTMENT OF EDUCATION 

The ten rules present the essence of the policy/evaluation interaction 
model. The remainder of this chapter presents examples of this interaction 
in the Oregon Department of Education, For these examples to be 
understood, it is first necessary to present some description of the 
Department organization arid how it functions. 

Organization of the Oregon Department of Education 

Figure 2 presents the Department's table of organization. Note that 
the five associate state superintendents report directly to the State 
Superintendent of Public Instruction and not to the Deputy State 
Superintendent, This has an effect on the making of policy. Further, you 
will note that both the Superintendent and the Deputy, like each associate, 
manages a division. Thus, they are involved in day-to-day operations and 
do hot exist in an ivory tower. This has an influence on policy making. The 
Educational Program Audit Division was established to separate evaluation 
responsibilities from program responsibilities. This was to relieve the 
ambivalence of program support people (e,g. Title I field liaison people) 
who put a great deal of energy into helping local district personnel develop 
programs to aid children and later have to apply the (often antithetical) 
evaluation rules developed by federal agencies or others in the 
Department, For example, this Division monitors P,L, 94-142 compliance 
while the Special Education Division provides technical support for 
program development in the field. Additional examples are possible but 
the point is made. This innovation has worked to the satisfaction of all 
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/parties and has had a great deal of influence on thinking about aJid^^doing 
/ evaluation-based decision-making throughout the Department, While the 
remaining division do not have the evaluation responsibility of the 
Program Audit Division, Yet each still must do a great deal of evaluation 
within its program support functions and in this case the Program Audit 
Division assists by providing technical assistance. 
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Figure 2. Oregon Department of Education organization as of 
September 1979 
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The State Board of Education, a lay board appointed by fhc Governor 
with the approval of the Senate, has been assigned a policy making function 
by the legislature. The State Superintendent, an elected official, carries 
out the Board's policies. In doing so he creates a great deal of policy in 
interpreting Board policy and in filling gaps ir\ policy on issues not 
addressed'by the Board, The Superintendent's powers are unusual in that he 
performs the same function as a district or circuit court in interpreting 
state and federal statutes concerning education. Appeal from his decision 
is made directly to the appellate division of the courts. 

The policies of the Board reside in the administrative rules 
promulgated by the Board, These follow a legislatively prescribed 
procedure and have the same^ weight as legislatively enacted statutes. 
Board policies in many areas have n6t been delineated, but must be gleaned 
from the administrative rules. While the Board is currently considering a 
process to develop a set of policy statements, Oregon practicality dictates 
that effort produce usable results. Flowery statements, or broad 
generalizations presented by other states do little to provide the guidance 
to decision makers that the Oregon Minimum Standards for Elementary and 
Secondary Schools (and the other sets of standards for other schools) 
provide. The policies. of the Superintendent are found in administrative 
memoranda and other special memoranda he promulgates and policy may 
be gleaned from his decisions in controversies (a'body of quasi-case law). 

The Policy Makers 

/" 

While it is correct to say that the State Board of Education and the 
State Superintendent are the policy makers identified by the legislature, 
the reality is that many others make policy when carrying out functions 
delegated to them by the Bpard and the Superintendents Therefore, to look, 
at policy in the Department, we must look beyond the table of organization 
for a functional description. 

The present superintendent has established a cabinet. It consists of 
the Deputy Superintendent and all of the associate superintendents. This 
cabinet meets several times each month as the Superintendent's Council, 
Note that the apostrophe in "Superintendent's" is placed before the "s" and 
not following the "s," Thus, the "Council is advisory to the Superintendent, 
The superintendent maintains his freedom to hear the Council's advice and 
to disregard it if a higher wisdom so dictates. The council, however, is not 
at all "window dressing," The discussions and the arguments put forth are a^ 
serious part of the decision making process and also serve to keep the 
associates current as to the Supferintendent^'s policy. This policy provides 
guidance to the associates when they make the many decisions required of 
them in' the operation^of their divisions. The council also serves to inforpi 
the Superintendent of decisions made by associates (and other staff). 
These decisions have established policy and, since the associates acted as 
the Superintendent's agents, the decisions. somewhat limit the Superin- 
tendent's freedom to establish policy. This balance of policy mc^king roles 
"is possible because of frequent formal and informal interaction among all 
parties' involved, ^ . . 

' Others who influence policy are the special interest groups. The 
Oregon Education Association, the Oregon School Boards Association and 
the Council of Oregon School Administrators are just three of many. The 
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opinions of special interest groups are often solicited because they offer a 
perspective which Department staff may not have considered. The same is 
true of the'opinions of citizens in general. 

The legislature influences policy by enacting laws which dictate the 
limitations on policy in certain areas. More directly, the legislature 
influences policy through the budgetary process. The Department's budget 
is presented biennially to the legislature by the Board. The Board's policy 
to support programs for gifted students is given life by a dollar amount of 
support-in the budget. The legislature, in modifying this allocation, places 
a limitation on the Board's application of its policy. One may argue that 
the policy to assist gifted students was not affected, only the degree of 
support was. But the next tkiie the Board buUds a budget, the earlier 
legislative action does affect policy. ^ 

The E valuators 

Those who hold payroll designations as "evaluators" reside in the 
Educational Program Audit Division. Approximately one percent of the 
total Department operational budget goes to support this division, but an 
estimated ten percent St the operational budget is used in evaluation 
because every other division retains some evaluative function. So, just as 
policy making does not completely reside in an identifiable few, neither 
does evaluation. In fact, in reality, all policy makers evaluate or use 
evaluation information (formally or informally) and all who make decisions 
about the conduct of evaluation are influenced by policy and establish 
policy by making decisions about what to evaluate. 



THE POUCY/EVALUATION INTERACTION 
IN THE OREGON DEPARTMENT OF EDUCATION 

The Oregon Minimum Standards 

Since decisions of all kinds on all levels of organization are made 
daUy, many examples can be used to show how policy and evaluation 
interact in Oregon. Of all of these possibilities, one has been chosen for 
discussion on the basis of the fact that it involves everyone in the 
Department and the State Board of Education. The Elementary-Secondary 
Guide for Oregon Schools Part I (Oregon's Minimum Standards) has been 
chosen as an example of (1) a policy setting process which is influenced by 
evaluation information and. (2) a statement of policy which influences 
evaluation. The Minimum Standards are discussed here because they are 
the Board's only complete statement about the Board's policy on evaluation 
and about the policy /evaluation interaction. Everf though the Minimum 
Standards are applied at the local level, we use it here for illustration of 
the policy that is applied at the state agency level as well* Like many 
state agencies, there is not a complete policy statement, for the workings 
of the agency itself. Informal transmittal of policy by the Board and the 
Superintendent indicate that the same policies apply at the state level, but 
there is not the concise form to show the reader. 

. • " The Minimum Standards are a set of administrative rules established 
by the State Board of Education under a legislatiye grant of authority to do 
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so. These have the same weight and influence on the schools as do 
legislatively enacted statutes. The Board adopts many sets of 
administrative rules but has chosen^ to designate only one subset the 
"Minimum Standards." While a' school district is required to comply with 
all of the administrative rules," only failure to comply with a Minimum 
Standard will authomatically put into motion a process which requires the 
district to correct its deficiency or lose its state funding support. This 
process -is- the-school standardization- (accreditation V process-^administered - 
by a section of the Educational Program Audit Division in response to a 
legislative mandate to determine that schools are meeting the standards 
set by the State Board. Of course, failure to comply with an 
administrative rule which is not part of the Minimum Standards subset 
carries penalties too. Those penalties, however, are stated in each rule or 
for a set of rules. But the school standardization process (team visits to 
every school in the state on a five-year cycle) is concerned only with 
compliance with the minimum standards. 

Let us pause and see what we can detect so far concerning the 
policy/evalua'tion interaction. 

— The Minimum Standards are the Board's expression of how a 
district must operate to provide a quality education to its 
students. These are statements of .policy or rules based on 
policy. 

— The legislative mandate that such policy be established was 
accompanied by a legislative mandate that the State 
Superintendent of Public Instruction determine whether 
schools are meeting these standards. This process of quality 
assurance is a process of evaluation. 

— Thus, the legislature, desirous of quality schools, required 
clearly defined policy for schools to follow and evaluation to 
make sure they do. How does the legislature ensure that the 
policy and the evaluation will interact? By placing the 
Superintendent in a decision-making role. The decision he 
must make is which school districts will continue to receive 
the state funds provided by the legislature. 

The discussion of the Minimum. Standards now branches into two 
streams. The first stream is a consideration of what the Standards contain 
because that sheds light on how the Board is directing the school districts 
to carry out the Board's policy, and how evaluation is required to ensure 
that the policy is carried out. The second stream considers how the Board 
establishes these Standards, that is, how the Board determines policy and 
how evaluation information influences this policy making process.^ The 
discussion of the first stream, the contents of the Standards, is based on 
the Minimum Standards currently in ifee in Oregon and adopted by the 
Board on June- 23, 1976. A revision process has been underway since 1978 
and a new set of Standards will probably be adopted soon. However, for 
our purpose here, either set will do. The discussion of the second stream, 
how the Board establishes standards, is a discussion of the revision process 
which will result in the new set of standards. 
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THE ROLE OF POUCY IN THE CONDUCT OF EVALUATION 
The Content of the Minimum Standards 

The 1976 Minimum Standards are divided into twelve sections: ^ 
* 

Definitions , These are definitions of terms 'used in the 
^Standards. They are more than definitions for the guidance of 
those involved with ' the standards because these have been 
adopted as an administrative rule much the same way that 
statutes contain definitions prefaced by "Fpr the purpose of this 
statute . . . 

6 

Goals, The Board states its goals for students and for the 
process of schooling. 

Accreditation . The Board here describes in detail its process for 
school standardization. That is, its process for responding to the 
legislative mandate to evaluate schools for compliance with the 
policy whicH is the major content of the Standards. 

Instructional Planning . This section contains a rule which 
\requir^ local districts to link evaluation and policy. This rule 
will be discussed in more detail below. 

Instructional Programs, tfere are the rules which express the 
Board's policies concerning the contents of a quality education 
program. These rules constitute the Board's response to the 
legislative mandate to establish standards of quality. 

Administration . Rules for the operation of a school district. 
This ^section, coupled ' with the sections on Student Servoces, 
Staff jand Class Load, Media and Materials, Facilities, Safety and 
Auxili'ary Services describes all policy for the operation of a 
local district which is not contained in the Accreditation , 
Instrukional Planning , and Instructional Program sections. 

The purpose of examining the contents of the Minimum Standards was 
to see what the Board requires of local districts as a mix of policy and 
evaluationJ It has already been stated that the accreditation section of the 
standards describes how the State Board will evaluate the local districts 
compliance] with the Board's policies, so the accreditation section is a view 
of the Board's policy on how the Board will assure compliance, that is, how 
the Board will use evaluation. Our purpose Is, little aided by a detailed 
di$LCUSsion of how the evaluation is conducted. The rules which describe 
this procesJare 581-22-202, 204 and 206. 

We are interested, here, in how the Minimum Standards have required 
local districts to use both policy and evaluation in the conduct of a quality 
school program. We are interested in the policy/evaluation interaction at 
the local level. Talking selected Standards, we will paraphrase or extract 
excerpts for the sake of brevity, and use these Standards to illustrate the 
interaction.! 
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standard 581-22-208, Instruetional Planning , Each district is 
required to establish district, program and course goals. These 
goals embody the stated policies about the district's educational 
program and the outcomes of that program. Once stated, the 
goals form the basis of the local districts evaluation of its 
program and policies about education. This evaluation is 
required by the Standard, Following the evaluation, the district 
is required to identify its needs by comparing assessment results 
to its goals. The final requirement of the Standard is that, based 
on this evaluation process, the district is required to establish 
policies for making program improvements. Thus, we see that 
the State Board has directed districts to rely on the 
policy /evaluation interaction when planning its instructional 
program. 

Standard 581-22-218, Educational Programs, The Board requires 
each district to: 

(1) Identify individuals'^ learning strengths and weaknesses; 

(2) Provide learning opportunities for students responsive 
to their needsy 

(3) Determine progress students make in their educational 
program; 

' " (4) Maintain student progress records and report the 
information to parents and students (OAR 581-22-218), 

This required evaluation process is intended to achieve the Board's 
policy that districts "provide all students opportunity to achieve 
district-adopted learner outcomes, requirements for graduation and 
personal goals through participation in educational programs relevant to 
their needs, interests and abUities, (OAR 581-22-218)," 

Graduation Requirements, It is the State Board's policy that 
local boards "shall award a diploma upon fulfillment of all state 
and local district credit, competency and attendance 
requirements (OAR 581-22-228(1))," 

Further, it is the State Board's policy that 

student transcripts shall record demonstration of minimum 
competencies necessary to: 

(1) Read, write, speak, listen; 

(2) Analyze; 

(3) Compute; 

(4) Use basic scientific and technological processes; 

(5) Develop and maintain a healthy mind and body; 

(6) Be an informed citizen in the community, state, and 
nation; 

(?)• Be an informed citizen in interaction with 
environment; 

(8) Be an informed citizen on streets and highways; 

(9) Be en informed consumer of goods and services; 
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(10) Function within an occupation or continue educaMon 
leading to a career (OAR 581-22-231(1)). 

It is the State Board's policy that each local district expresses its 
policy about what a competent graduate is: "The local board shall . . . 
adopt and make available to-the community minimum competencies it is 
willing to accept as evidence students are equipped to function in the 
society in which they live (OAR 581-22-231(2))." 

The State Board policy, then, requires the local district to evaluate 
eaeh student to see if the student has achieved sufficient competence to be 
awarded a diploma. - 

Each local district enrolling students in grades 9 through 12 shall 
implement the competency component of its graduation 
requirements as follows: 

(1) Establish minimum competencies and performance 
indicators beginning with the graduating class of 1978; 

(2) Certify attainment of competencies necessary to 
read, write, speak, listen, analyze and compute 
beginning with the graduating class of 1978; 

(3) Certify attainment of all competencies beginning not 
^ later than with the graduating class of 1981 (OAR 

581-22-236). 

. We have seen that the State-Board's policy is that Oregon graduates be 
competent citizens. To achieve this, the Board requires each district to 
state its policies concerning competent graduates* and the Board requires 
an evaluation of students as a vehicle for effecting its statewide policy of 
competent graduates. Oregon differs from some states in tiiat, in Oregon, 
evaluation of the graduates is conducted locally and is uiaform only to the 
extent that the general competency areas have been defined. In intent, 
Oregon is no different from any state which establishes uniform exit 
requirements arid conducts a uniform statewide evaluation of all students. , 
Additional examples are possible, but the intent was to show that it is 
the policy of the Oregon State Board of Education that (1) policies be 
established and communicated, (2) evaluation be conducted to be sure these 
policies are implemented, and (3) policy and evaluation interact by 
requiring decisions (about programs and students) based on that 
interaction. Here, then, we see that the content of the minimum standards 
demonstrates that, in Oregon, evaluation is required by policy and policy 
and evaluation interact in the decision-making process. We now turn to the 
second stream of our discussion of the Minimum Standards to see how the 
developmerft of the policy is affected by evaluation. 

THE ROLE OF EVALUATION IN THE DEVELOPMENT OF POUCY 

,We have seen that the Board*s policy in the form of the Minimum 
Standards requires districts to use evaluation information when setting 
policy and when making decisions* It has been asserted that the policy 
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about evaluation-based policy making in the state agency is the same as 
the policy illustrated by the excerpts from the Minimum Standards. It can 
be demonstrated that the Board and the Superintendent rely on evaluation 
information when establishing policy. Again, many illustrations are 
possible, but it seems reasonable to show that the policy statement which 
requires evaluation-based policy making (the Minimum Standards) is, itself, 
poUcy based on evaluation. 

When the revision of the current (1976) Minimum Standards began in 
1978, the Superintendent and^ the Board wanted information about the 
objectives of the standards and about implementation problems. In 
additibn, they wanted information about the results of the implementation 
of the 1976 Standards. A study was commissioned and' conducted by the 
faculty of the University of Oregon. The study was designed by the present 
author. In order to show the reader a real example of evaluation in the 
real life of the Department, the following discussion is based on the major 
portion of the original paper which presented the 'design to the 
Superintendent and the Board. This provides an insight into the realities of 
conducting evaluation for use by lay policy makers. 

EVALUATING OREGON^S MINIMUM STANDARDS 
A Context in Which to Evaluation ^ 

Evaluation is the collection of information for the purpose of making 
decisions. The evaluation of a program such as the Oregon Minimum 
Standards consists of the collection of information for the purpose of 
deciding whether^ the ,Standards are effective as they are or whether 
change is needed. The information to be collected relates to goals, 
implementation procedures and observable results. None of the 
information, however, is useful unless there is a well-defined model for- 
program development and evaluation known to those involved in the 
evaluation so that the collected information can be "inserted" in the proper 
place in the decision making process. Simply put, if we do not know 'how 
we wish to use collected information* (i.e., what decisions we wish » to 
make), we wiU not know what information to coUect nor wiU we know how 
to use whatever information we do collect. Before we can begin a 
discussion of the information we want and how we plan to get it, we must 
view the context within which this evaluation should be conducted. 

There are many planning models ("planning" as used here subsumes 
evaluation and the making of policy). The model presented to the State 
Board of Education and adopted tentatively as the Department's generic 
model appears in Figure 3.^ A simplified version appears in Figure 4. 
This model for planning and evaluating programs and policy, briefly 
described, requires: 

1. The development of GOALS. 

2. The identification of NEEDS (by comparing "what we want" 
(GOALS) to "what is"). 

" .3. .The identification of LONG RANdE OBJECTIVES. . 
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4. The identification of short term PROGRAM CHANGE 
OBJECTIVES which, when achieved, will move us toward the 
achievement of the GOALS, 

5. The development of a PLAN to achieve some of the 
PROGRAM CHANGE OBJECTIVES. 

i - 

6. The implementation and eventual evaluation of the 
effectiveness of the PLAN (an effort which seeks the 
answer to the question "are the goals of the PLAN being 
achieved?"). 

7. vA judgment concerning ."are ' the * PROGRAM CHANGE 
OBJECTIVES, LONG RANGE OBJECTIVES and original 
NEEDS being met by this plan?" Following this effort we 
make decisions about the efficacy of the PLAN and the 
policies. 
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Figure 4. Oregon Generic Planning Model: Simplified Version 

If an evaluator who knew nothing of the history of ^the development of 
the Minimum Standards were given this generic model and a copy of the 
Minimum Standards, he or she would probably assume that prior to the 
adoption of the Standar'ds, the Board:' 

1. established GOALS which described the Board's philosophy 
and what it hoped to accomplish through a variety of means 

2. used assessment procedures to collect information which, 
when compared to the GOALS, identified the NEEDS (i.e., 
the Board used the discrepancies between what it wanted 
and "what was" as statements of NEEDS) 

3. identified some LONG HANGE OB JECTIVES 

4. identified some short term PROGRAM CHANGE 
OBJECTIVES which, if accomplished, would lead the Board 
closer to the achievement of its GOALS 

5. developed a PLAN (the Minimum Standards) which it hoped 
would achieve some of ^ the PROGRAM CHANGE 
OBJECTIVES 

6. implemented the PLAN with an evaluation design to answer 
the question "Are the goals of the PLAljJ being achieved?" 
All of this so that, within the context of the generic model, 
the Board could appraise how well the identified NEEDSo 
were being m^t > . ^ ' 

The evaluator's assumptions would be reasonable but erroneous. In 
fact, prior to the implementation of the Minimum Standards, the Board did 
not establish GOALS, identify NEEDS ^ or identify LONG RANGE 
OBJECTIVES or PROGRAM CHANGE OBJECTIVES. The Board did 
develop a PLAN, albeit in the absence of a sound context, but neglected to 
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develop an evaluation component for the PLAN. As a result, the Board 
cannot, it would seem, readily Conduct a proper evaluation, after'the fact. 
However, almost all of the parts of the generic model did exist and were 
attended tp but the parts were not brought together as a cohesive whole. 
It is possible to do so because of the vantage point presented by the 
passage of time and because the work done to date has been of high quality. 

We can conduct a proper evaluation because - it is possibly to 
reconstruct some of the missing links in the 'generic model. The evaluation 
answers four basic questions, "Have the goals of the Minimum Standards 
PLAN been achieved?" "Are the PROGRAM CHANGE OBJECTIVES being 
achieved?" "Are the LONG RANGE OBJECTIVES being achieved?"; and 
"Are the identified NEEDS being met?" 

The Need for Reconstruction 

Consider the question, ^^Are the PROGRAM CHANGE OBJECTIVES 
being achieved?" To answer this question we need to know what the 
PROGRAM CHANGE OBJECTIVES are2 ^^at we are trying to achieve 
through the implementation of the Minimum Standards (the PLAN). 
Further, to evaluat'e the effectiveness of the PLAN we need to know' the 
goals of the PLAN itself, and the performance indicators, measures and 
standards related to each of the goals of the PLAN. Of all of these, we 
have only the PLAN. We must identify the PROGRAM CHANGE 
OBJECTIVES, the goals of the PLAN and the performance indicators, 
measures and standards related to each goal. ^ 

Once *we have answered this question we can ask whether the LONG 
RANGE OBJECTIVES and NEEDS are being met. To do so, however; we 
must Identify the LONG RANGE OBJECTIVES and NEEDS and to do that 
we must first identify the GOALS and the assessment information which 
enabled us to determine needs relative to the GOAI^c 

Upon accomplishing all of this, we will have the elements of a proper 
evaluation. 

o 

A PLAN FOR CONDUCTING THE EVALUATION 
Part L Evaluating the Minimum Standards Out of Context 

This section presents a pfern for answering the question, "Have the 
goals of the Minimum Standards PLAN been achieved?""^" As pointed out 
above, this question is important but it is only one of four important 
questions. Trying to answer the remaining three questions^ requires 
placing the answer to the first question within the context of the Board^s 
generic planning model. Evaluation of the Minimum Standards within that 
context is the subject matter of Part IL \ 

To determine whether the goals of the Minimum Standards PLAN have 
been achieved we must first identify ft\e goals and then identify 
performance indicators foe each goal. Following that, we must identify 
appropriate measures and standards to determinKwHether the performance 
indicators were achieved and then make inferences about the achievement 
of the goals. We caivdefine twelve tasks: 
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1. Identify the goals of the Minimum Standards PLAN. 

2. For each goal, ident;ify one or more performance indicators. 

3. For each performance indicator, identify one* or more 
appropriate measures^ and performance standards which 
will be used to collect and analyze information about the 
achievement of the performance indicator. 

4. Collect and analyze data. 

5.. From information about performance indicators, infer 
about the achievement of each goal. 

6. Identify the plan for implementation of the Minimum 
Standards. 

7. Identify the implementation plan goals. 

8. For each goal, identify one or more performance indicators. 

9. For each performance indicator, identify one or more 
appropriate measures and performance standards which 
will be used to collect an<t analyze information about the 
achievement of that performance indicator. 

10. Collect and analyze data. 

11. From information about performance indicators, infer 
about the achievement of each implementation goal. 

12. Produce a report on the achievement of program and 
implementation goals.„ 



PART n. EVALUATING THE MINIMUM STANDARDS 

wrrniN the context of the generic planning model 

Here we attempt to answer the questions, "Have the LONG RANGE 
OBJECTIVES been achieved?" "Have the PROGRAM CHANGE 
OBJECTIVES been achieved?" and "Are the NEEDS being met?" To answer 
these questions we must first have the answer to the question posed in Part 
I, "Have the goals of the minimum Standards PLAN been achieved?*' 
Before we can answer these three questions, we must also fill in the 
missing part^ of the generic planning model. These missing parts are the 
State^ Board of Education GOALS, identified NEEDS LONG RANGE 
OBJECTIVES and PROGRAM CHANGE OBJECTIVES. We can. define six 



tasks: 



1. Identify the State Board of Education GOALS. 

2. . Idientify the NEEDS. ^ 
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3. Identify the appropriate LONG RANGE OBJECTIVES. 

4. Identify the appropriate PROGRAM CHANGE OBJECTIVES. 

5. Apply the data garnered from the evaluation of the 
Minimum Standards PLAN (Part I, above) to: 

a. determine whether the PROGRAM CHANGE 
OBJECTIVES were achieved 

b. determine whether the LONG RANGE OBJECTIVES 
were achieved 

c. determine whether the NEEDS were met 

d. infer about the achievement of GOALS 

6. Produce report on the achievement of LONG RANGE 
OBJECTIVES, PROGRAM CHANGE OBJECTIVES and the 
meeting of NEEDS. 



Results of the Study 

The preliminary portion of the evaluation study has been conducted. 
The results of an extensive survey of several school-based populations are 
available. These resulti are currently being used by the Superintendent and 
the Board in the setting of new policy. 



This chapter has explored a conceptual framework for the relationship 
between policy and evaluation and has provided examples of the 
policy/evaluation interaction in the Oregon State Department of 
Education. The author has attempted to show that policy makers are very 
much dependent on evaluation information and, in fact, are required to be 
so dependent by the policies of the State Superintendent of Public 
Instruction and the State Board of Education. In addition, it has been 
shown that evaluators and the, nature of evaluation are guided by the 
Departments written policy. 



SUMMARY 
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FOOTNOTES 



^The Oregon Planning Model was developed jointly by Gordon 
Ascher, Robert Clemmer, and Donald Egge, all of the Oregon State 
Department of Education, " - 

2we could use the past tense ^but we want our results to be useful 
now, so we will forget historical objectives and work with those currently 
in place, » 

^The remaining three questions are; (1) "Have the PROGRAM 
rCHANGE OBJECTIVES been achieved?" (2) "Have the LONG RANGE 
XDBJECTIVES been achieved?" and*(3) "Are the NEEDS being met? 
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PARTIL 

Analysis of the Case Reports 



In Part H, two substantive specialists provide integrative analysed of 
the previous six chapters. Thomas F. Green attempts to clarify certain 
features of the interaction between policy and evaluation. He explicates 
what is meant by the question, "What is a policy question?" Green comes 
up with the startling conclusion that neither the most efficient action nor 
the most technically proficient analysis will suffice to resolve the central 
conflict between the social aims which gives rise to a policy question. 
Wise policy is not made with enough knowledge to determine a decision, 
and policy questions are never asked out of a primary interest in adding to 
6ur knowledge. Green argues that our answers to policy questions may bie 
improved by obtaining better information and doing better analyses which 
will then be more rationally persuasive. But such questions can, will, and 
usually are, answered even without such inform'ation. It is clear from 
Greenes discussion why evaluation findings cafi never completely determine 
a decision. ^ 

In the second part ot his chapter. Green examines the different facets 
of the policy process— policy analysis, policy formation, policy decision, 
and political analysis. He defines policy analysis as the rational or 
technical assessment of the net marginal trade-^ffs between ;different 
policy choices. Policy formation is that activity by which we scji^k to gain 
agreement on what form a specific policy can or will take, as opposed to 
what form is, ought to take. A policy decision can be described as the 
authoritative decision of some officer, administrative or legislative, by 
which he or she establishes, for the moment at least, a line of action. 
Unlike policy, analysis, political analysis concerns not determining the net 
benefit of a given course of ^ action, but rather determining its political 
weight. f 
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Through the use of extensive footnotes, Green establishes links 
between his conceptual analysis and the preceding six chapters. His 
footnotes should definitely be read, therefore, ' within the flow of his 
analysis. 

The purpose of Nick L. Smith's chapter is to use the precedirig^ix 
chapters - to illuminate the range of factors which influence state 
department evaluators* practice. What accounts for the differences in the 
structure and function of evaluation units within state departments? What 
influences the nature of evaluation practice within these settings? These 
are questions, addressed in his chapter, as he discusses five sources of 
influence on evaluation practice in state departments of education^ thie 
influence of the federal government, state governments, the state agency 
itself, local school districts, as well as the influence of other groups. 

Smith finds that within these research emd evaluation units, [Political 
and lega) considerations are just as importemt as technical considerations, 
and that^ most evaluation attention is focused on management assistance or 
policy analysis to the general exclusion of the improvement of instruction. 
Because of such foci, the ability to communicate cmd persuade in a highly 
politicized ^vironment is an essential skill. Astute budgetary and 
financial analysis, problem definition, understanding of the state context, 
and the ability to know what can be affected and how within the state 
setting, are also needed (ot effective evaluation within state departments. 

Both of these chapters give penetrating insights into the relationship 
between policy, evaluation, and decision making within the research and 
evaluation units of state departments of education. 
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CHAPTER 7 

Policy and Evaluation: A Conceptual 
Study 

Thomas F, Green 

What is (or can be) the relation between public policy, on the one hand, 
and evaluation, on the other? Is there a way to attain clarity in 
understanding what an appropriate relationship would be? I intend to 
answer these questions in two steps. I shall consider first the typical 
character of policy questions. In a second part, I shall examine different 
facets of the policy process— analysis, formation, decision, and political 
judgment. I believe that these steps, taken together, will allow us to forfh 
a view, however tentative, about the relevance of evaluation as a 
professional practice to various aspects of public policy. 

POUCY QUESTIONS 

There is probably no single definition of "policy" adequate to capture 
the full range of ordinary usage. Such a definition would have to 
satisfactorily capture the likenesses and differences between managerial 
decisions, guides to practice, niles of legislation, basic choices of political 
direction, and the bar application of standard requirements in 
administration (No. one gets unemployment assistance for more than 
twenty-one weeks). It would have to include some matters that fall under 
"Standard Operating Procedures" (file expense accounts within ten days 
with receipts), matters of personal practice (I don't answer the phone at 
home because it's never for me. Avoid arguments in the office. Don't give 
G-3s unsatisfactory ratings; it takes too long to defend them.) Although 
each of these things can be. called "policy", the term has, in each case, a 
slightly different meaiiing.^ o 

A- . 
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On the other hand, it is possible to establish some common features of 
what we ordinarily take to constitute a policy question , especially if our 
concern is with public policy. So I wish to render the question, "What is 
policy?" by asking "IVhat is a policy question?" 

A policy question is a request for a line of action aimed at securing an * 
optimaiTesolution^ 'ot~a:c"ontlict'b^"tw whTchnfmsf~ 
be accepted, but which, taken together, cannot all be maximized. That is 
to say, we do not have a well-formed policy question, a fully formulated 
statement of a policy problem, until we are able to state the set of values 
or goods from which the question arises, fuid unless we are able to state 
that set of values or goods, so that we can discern their mutual 
inconcistency. 

The issues involved in the contemporary movement for fiscal reform in 
education provide about as clear a model of policy questions generally as it 
is possible to shape. The policy issues are always "nested" within a set of 
mutually incompatible values or goods. We seek 

1. equal educational opportunity for children 

2. an equitable distribution of the tax burden 

3. local control of education 

4. responsible management of the State-budget 

^Maximizing any one of these goods— that is, getting as much of it as we 
can— will do damage to the advancement of the others. The policy problem 
is generated by the fact that we accept all four of these aims and yet they 
cannot all be maximized. We cannot have all the local control possible 
because doing so will probably mean getting less than would be good in the 
way of equity for children and taxpayers and control on the public budget. 
On the Siher hand, if we maximize equity for children, then we are likely 
to get more inequity in the tax burden and less local control. The .problems 
of educational finance policy, in short, do not arise merely from the need 
to establish a more equitable system for taxpayers and children. They 
arise rather from the need to do so within a system of public goods that 
secures also both local control and responsible public management. 

I daresay that all issues that we would describe as questions of public 
(or even personal) policy have this feature. 2 They are always "nested" in 
a set of'^social values or social goods which must all be considered, but 
which, taken together, are more* or less mutually incompatible. Consider 
the issues surrounding ths imposition of exit standards at the secondary 
school. Here we seek the mutual benefits of: 

1. universal attainment (or at least racially and ethnicfidly 
balanced frequencies of attainment) 

2. common standards of achievement 

3. culturally pluralistic communities 

The mutual inconsistency of these aims is transparent when they are 
visibly juxtaposed. The difficulties of finding some balance in the entire 
set is well illustrated by recent experience in Florida and Virginia.^ 
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Policies are established to maximize common standards of achievement. 
The immediate consequence is to pay a price in securing the other two 
goods. So the policy is adjusted— or its implementation is delayed. It is not 
Linreasonable to surmise that continuing adjustments will result in policies 
that appear on the surface to depart substantially from previous 
formulations, but whose~^oilsequences are norsubstaritially'^ffeFelTt.^^ S 
a result is likely to emanate more from the presence of mutually 
incompatible values or goods than from administrative "bungling," or 
blindness, or from inefficiency, or from political chicanery. That is to say, 
neither the most efficient action nor the most technically profficient 
analysis will suffice to resolve the central conflict between the social aims 
within which the policy question resides. In general, there is no technical 
solution to a policy question. For example, there is no purely technical 
resolution of the fact that if, in social institutions, we get all the 
efficiency we can, then we are likeiy to have less community than we need 
or desire. • 

This conclusion, however, may seem outrageously facile.. It deserves 
some explanation, and that explanation can be discovered in two points. 
The first requires that we grasp the important fact that what counts as an 
answer to a policy question always takes the form of a "What we should do" 
and never a "What we know." Oj^ly practical questions are admissible in a 
public forum, never theoretical questions. And this is fortunate indeed. It 
means that in the domain of policy we are able to arrive at agreement on 
what to do without having to agree on the reasons for doing it. We must be 
able^to agree on a line of action and stick to it, even when we do pot agree 
on what is good apd even when we have different goals.^ ; " 

The result of a policy question is always a decision and an action. The 
result of a theoretical '^estion is always a truth claim. Policy deliberation 
is aimed at action, not at the acquisition of knowledge; theoretical 
questions are always aimed at the acquisition of knowledge, not at action. 
I do not mean by this claim that we can or ever should make public 
decisions without knowledge. Social action, no doubt, should be informed. 
Nor do I mean that we can ever gain greater knowledge without some 
action. Research, after all, is a kind of action. I mean only that wise 
policy is never made with enough knowledge to determine a decision, and 
policy questions are never asked out of a primary interest in adding to our 
knowledge. 5 

In this day and age it is not hard to imagine someone saying, "If we 
just had a methodology sufficiently sophisticated and a body of relevant 
data sufficiently refined, then we could answer whatever policy questions 
may come along." Such a person has been captured by a delusion. The 
delusion consists in supposing that a policy question is a theoreticarl 
question when, in fact, it is not. Any time we suppose that a policy 
question can be resolved by some addition to our knowledge, then it will 
turn out that what we supposed was a question of policy has turned -^ut to 
be merely a problem of engineering or efficient administration instead. 

My point then is not that we should abandon all attempts to improve 
our methods of evaluation or policy analysis. My point is rather that since 
our indecision in matters of policy does not arise from the lack of such 
methods, therefore, it is unlikely to be laid to rest by their development. 
In matters of policy, we are confronted with indecision not because our 
knowledge or technical facility is faulty but precisely because we are 
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confronted with a kind of question that, in principle , cannot be answered by 
any increment or improvement of knowledge. Our answers to policy 
questions may be improved by better information and better analyses in the 
sense that they wQl be more rationally persuasive. But such questions can, 

will, and usuflUy ate,_ans3M.ecexl eyj«i_Mit hout such inlormation. 

Furthermore, it is not obvious that the answers given in the absence of 
such analyses are worse than or even often different from the answers that 
would be given in their presence. In short, the answers may be ''better 
grounded" rationally, but still not different or better in any other 
sense, 6 We can, no doubt, do something more rationally persuasive than 
consulting chicken entrails, but we are unlikely to get anything that in its 
results is quite as decisive. And this is so because of the properties of 
policy questions, not because of deficiencies in policy evaluation. 

This observation brings me then to a second important reason why 
there are no technical solutions to policy problems. In paradise, there is no 
policy— except, perhaps, admissions policies. But why do policy questions 
not arise in paradise? There are mtiny ideas of paradise, of course, but on 
the whole, one is inclined to view that men conceive of it as a perfected 
state of affairs in which wants presently denied in "an imperfect world will 
somewhere or sometime be satisfied in a perfected world. Consider the 
view Emerson expressed in his essay "On Compensation," He heard a 
sermon, the chief message of which was that those saintly and good souls 
of the world who suffer without the comforts and amenities of life should 
persevere nonetheless in their goodness. For though sinners, with their fine 
carriages and furs, may seem to prosper now, they shall suffer later; and 
though saints may have to do without, they shall later be rewarded. The 
import of such a sermon, thought Emerson, was the message of saint to 
sinner, "You sin now, I shall sin later, I would sin now, but I can't," This is 
one rendering of the view that heaven is that perfected existence in which 
wants present, but now denied, will be satisfied. 

All this suggests, of course, that the problem of optimality— therefore 
the need for policy— would be banished from any world in which human 
wants or desires are perfectly balanced by their satisfactions. What else 
could paradise be except a condition in which all human desires are 
satisfied? ^ 

There are two general strategies always sufficient to produce such a 
I solution. The first lies on the side of doing something about the 
satisfactions available to human beings (productivity), and the second lies 
on the side of doing something aboiit their desires (education and the 
development of character). The first leads us always to solve the problem 
of optimality by the provision of abundance; there is presumably no 
problem of satisfying wants, There is enough of everything, including 
enough justice and enough virtue. When there is no problem of satisfying 
wants, there is no conflict of goods and, therefore, no problem of 
optimality. 

The other strategy is the converse. There is no scarcity of what 
nobody wants. So the second way to fesolve the problem of optimality lies 
not in the satisfaction of wants, but in their control, their composition. 
Thus, for Ghandi, diamonds and mink were plentiful, not because they were 
any the less scarce, but because they were not wanted. And not being 
wanted, they were abundant, Jf heaven is that condition in which wants are 
satisfied, then there may be abundance in heaven not because goods are 
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maximized, but because wants are "composed." In neither case do 
problems of policy arise, and the reason they do not arise is that such 
methods of composing the goods of the world or'Of reconciling wants and 
satisfactions would render choice unnecessary. There is no conflict of 
goods in paradise. And the fact that there is consequently n o need for 
policy is part of the proof that to formulate a policy question is to 
formulate the conflict of goods within which it is "nested." 

But why do we not imagine paradise to contain interpersonal conflicts 
of wants? The answer is that to do so would involve either the judgment 
that wants are improperly controlled or that goods are insufficiently 
supplied to satisfy them. Such a condition would introduce a problem. 
Paradise would no longer be a perfected existence. It would contain the 
problem of optimality, a kind of allocational defect. Paradise would 
contain the problem of composing the most satisfactory combination of 
what goods do e.xist, and who should get them in what degree. In short, 
such a condition would introduce into paradise precisely those 
circumstances that create the need for policy and that dictate the features 
of any well-formed policy question— What are the goods in conflict? What 
is their best possible adjustment? How can we reach it? What are the 
trade-offs?'^ 

The main point of this apparent digression is that problems of policy 
are an immediate and direct reflection of some immensely fundamental 
Characteristics of the world and of human existence within it. They arise 
because the goods— not simply the interests— that human beings seek to 
secure in the world are interdependent and do conflict, not all the time in 
every respect, but all of the time in some respects. Only in paradise can 
we imagine that all human goods are simultaneously in sufficient supply so 
that there is no conflict in their allocation. It is important to. note that 
since knowledge is a certain kind of good, then the idea of paradise 
includes the assumption that there iis sufficiency of knowledge. But 
paradise does not drise because there is an abundance of knowledge. Policy 
questions are not banished from paradise because our capacity to know is 
perfected. They are banished rather because there is either an abundance 
of all goods^ or because there is a suitable composition of desires. 

This formulation, however, is not aU that is needed in fi^tposing the 
presuppositions of policy questions. It deals with the presupposition of 
scarcity, but not with the presupposition of interdependence between 
goods. It is sufficient to show that when scarcity is absent— as it is in 
heaven— then no policy problems can arise. But the example presupposes 
that it makes conceptual sense to suppose that all human goods can exist in 
abundance simultaneously . Our imaginary hypothesis assumes that 
abundance of some goods can always be secured without significant costs in 
others. 

The fact is, however, that hiiman goods do conflict in such a way that 
they cannot aU be provided simultaneously in sufficient supply to satisfy 
human desires. The point is central. Human goods do conflict so that the 
price of securing the abundance of some is always failure to secure as 
much as we would like of some other. Thus, if we succeed in providing as 
much equality as is wanted, we are unlikely to have as much liberty as is 
wanted. If persons develop as much tolerance for ambiguity as is wanted, 
we are unlikely to have as much courage as is wanted, and the price will 
sooner or later become apparent. 
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The view then is that human goods conflict (read "values" if you wish) 

not simply because they are in short supply, nor simply because 
interpersonal preferences conflict, but simply because they are not 
structurally consistent. They cannot all 6e maximized, even in paradise. It 

is a familiar idea that human wants or human interests conflict. But the 

view here is that human goods conflict. The fact that human interests 
conflict is what produces political problems, finding an adjustment between 
conflicting humto interests. But the fact that human goods conflict is 
what produces policy questions, finding an adjustment between conflicting 
goods. Human goods continue to conflict even when human interests do 
not. Even when all are agreed on a single predominating interest— victory 
in an all-out war, for example— there will remain policy problems. 

In other words, there are no unlimited goods in the world. There are 
no goods which, if provided in great abundance, would not have the 
consequence of certain other goods being in short supply. The ultimate 
solution to any problem of policy is, therefore, to be found^only in paradiso; 
that is, only under conditions in which problems, of policy are not so much 
solved as they are simply non-existent. Such a state of affairs may be 
ideal; and in that respect, it may be optimal. But it is not possible, and in 
that respect, it is not optimal at all. That the ultimate solution of all 
policy problems is to be found only in paradise, may be precisely the fact 
that gives rise to the consistent and apparently ineradicable human impulse „ 
to think of social solutions to policy problems in Utopian terms. 

But Utopian thinking is defective not merely because it pays too little 
attention to feasibility. It is flawe:! more fundamentally because it pays no 
attention at all to politics.^ It is a fact of large significance that, no 
well developed literary exposition of Utopia ever includes an account of 
politics. The central assumption always is that in Utopia the needed 
balance between conflicting human goods is resolved. Reconsideration is 
not needed. Therefore, the introduction of politics into Utopia would be a 
threat to— not a part of— the good life.9 In Utopia, problems of policy 
remain only to the extent 'chat there remain problems of monitoring the 
society and managing its affairs. Politics is replaced by administration, 
and the role of evaluation, if it exists at all, would be reduced to serving 
the ends of management. 10 

Policy questions do not arise at all in paradise. Serious ones do not 
arise in Utopia. But the reasons are different. The reason that serious 
ones do not arise in Utopia is not that goods are abundant or that desires 
are composed, but that the inherent conflict between goods is taken as 
resolved. All that remains is management. Not even the presumed Utopia 
of putting evaluators in charge would alter that result. 

We must note a related feature of policy questions generally. Like a 
reporter filing his story for the evening edition, whoever will answer a 
policy question, in the real world, must do so within strict constraints of 
time. The reporter files a story by deadline, but always with the 
knowledge that there will be another deadline, arid the present story can be 
amended by the next as events change and further facts are revealed. Two 
points are discernible in this observation. The first is that policy questions 
generally are answered in anticipation that the answer will be revised. The 
second point is that they are the kinds of questions that have to be 
answered on time, even though the information needed for the answer is 
not on time. Both points arise from temporal constraints, but they have 
different implications. 
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The first implies simply that policies are im.permanent. We expect 
them to change. Often, they are not even very durable. They are not 
supposed to be.' In that respect, policy questions are unlike constitutional 
questions, and they differ from moral questions in exactly the same 
rpspect. We do not evppfft pprsnns to (*hflngfi their mor fll pri nQipIes_(iio.t&. 
the offense involved in calling them "policies")— -the constitution of their 
character— with great frequency. We do, however, expect policy to change 
with experience and with fair frequency. "Policy" implies "politics" and 
"polity"— -a point that I shall consider in more detail later. We may note, 
for thejnoment, however, that if there is a practice whose improvement 
would promise the largest marginal gains in the formation of policy, it 
would be an improvement in the practice of politics, not evaluation. 

The second point is equally vital. It means that just as it is better for 
the reporter to file a story on time without all the facts than to get all the 
facts and file the story too late, so also it is better, in the case of policy, 
to make a decision on time, but without all the facts, then it is to get all 
the facts and make the decision too late. In the caise of policy, decisions 
have to be made always within large limits of uncertainty. Some reduction 
in the degree of uncertainty will^be helpful, but the degree of 'reduction 
normally required for academic research is both improbable for policy 
decision and would often be undesirable even if it were not improbable.^^ 

In other words, crude data arriving on time are always better than 
refined data arriving too late. So it is acceptable, even fortunate, that the 
methods required for policy decision are crude, even though the usual 
methods of research are necessarily refined. To answer a policy question, 
we need as much information as we can get. But "as much as we can get" 
usually turns out to be less thr^n we could get if we had more time and, at 
the same time, more than can be used and more than will make 
difference to the decision. Policy questions, in other words, are a^ays 
answered in the midst of uncertainty, and there is always a point b 
'which more information— however more excellent— will contribute lit/tie to 
the reduction of that uncertainty and do nothing to alter the direct on of 
the decision.l2 

All this is simply another aspect of the claim that policy questions are 
practical rather than theoretical. They are questions of the sort that 
to be answered, and that will be answered, even when we do not 
which among alternative answers is the best. If these observatio/ns are 
credible, then it is possible to understand the claim sometimes a^flvanced 
that academic research is useful for policy decision inverse!^ to its 
excellence as academic research. . 

But there is another, and more far-reaching, implication o^these last 
observations. Suppose we define the character of a professional— as I think 
we must— in relation to the degree to which the professional's practice 
requires accuracy of approach in the midst of uncertainty. If we imagine a 
two-dimensional space defined by (x) increasing uncertain\y in the 
predictable behavior of the materials dealt with in any practice^ and (y) 
increasing' uncertainty in the consequences of one^s actions, then w^ would 
be able to arrtiy the practice of professionals, craftsmen, artist^N^and 
technicians as dispersed along the diagonal. \ 

I would suggest (and this is only conjecture) that such a definition of 
"professional" would capture important features (though not aU important 
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features) of what we ordinarUy mean by the term. But such a definition 
has interesting and important consequences. It means, among other things, 
that the education of professionals is an education in being able to make 
judgments and decisions in the context of large uncertainties. Such 
-edtf ca t ion, in a d eefi-sense^is-education-notan-techaical-s ki ll (nor without 
technical skill), but in the capacity to deal with uncertainty, to live with 
doubt, to change one's mind. It is always ultimately a kind of education in 
self knowledge— learning one's limits. 

But more importantly, and more concretely, it would tend to rearrange 
our conceptions of the relation between the practices of crafts and ■ 
professions. For example, it would mean that the neuro-surgeon, although 
possessed of greater technical skill and manual dexterity, nevertheless, 
deals with a n*ore predictable problem than the nurse, who, having to deal 
with the whole patient, acts in the midst of larger, though often less 
serious, uncertainties of both material and consequences. In short, by such 
a view, the nurse turns out to be engaged in a practice that 'is more 
professional than the surgeon's. 

In an analogous way, the implication is that the evaluator, insofar as 
he presses for greater certainty and actually seeks to become the 
determiner of policy, is less of a professional, at his best, than .the 
politician or executive, at his best . The drive of evaluators, to whatever 
extent they seek to find in practice a means of resolving policy questions, 
is in fact the drive to make the politician an evaluator, which is to say a 
technician of policy decision. Such an achievement, if ever realized, would 
constitute the most radical transformation of our political institutions and 
the practice of policy decision that one can imagine. Even if it could 
happen— which is doubtful— it would be undesirable if carried very far. But 
this discussion cannot be extended, refined,- or made convincing without a 
further set of distinctions. 



FACETS OF THl^ POUCY PROCESS 

Between policy analysis, policy formation,, policy decision or 
promulgation, and the political analysis of policy there lie clear 
differences, and the practice of evaluation will relate differently to each. 
The tendency exists to regard these four activities— analysis, formation, 
decision, and political analysis— as steps in the policy process. But that 
view is misleading, because these activities are never fully discrete in 
practice and they do not occur in any persistent sequence. Nevertheless, 
there is a distinction of practice corresponding to each activity, and each 
practice, moreover, has its distinct kind of theory. 

Policy Analysis 

Policy analysis can be defined as the rational or technical assessment 
of the net marginal trade-offs between different policy choices. The 
question becomes, "Which set of values wUl be advanced, which wUl not, 
and with what net benefits?" This is the same kind of question that we 
confront, say, in the design of a hand drill. What should be the design? 
The question is "nested" in a set of values. We want low cost, high safety, 
ease of handling, and durability. We ^can ask and rather precisely 
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determine what marginal gains in one of these values will produce what 
corresponding costs in the others. If we "go for" greatest durability, then 
we are likely to get a higher cost and less ease of handling. If we "go for" 
the lowest cost possible, then we are likely to sacrifice something in the 

way of durab ility and safety. The design problem is to discover a balance 

between-these competing values. 

Enter the problem of incommensurabilities! How do we determine 
which among the competing values is to be given greatest weight? Which 
has the ^eatest worth— low cost, safety, durability, or ease of handling? 
Not even the most refined analysis of the costs ari benefits will solve that 
problem. Such an analysis gives us the possibilities or a set of choices, but 
it does not pick out any preferred answer from within the set. Yet we need 
some procedure for doing just that. In short, we need a. market decision , 
and getting a market decision is, no doubt, going to require a market 
analysis. 

Is our market made up of professionals? Or does it consist essentially 
of amateurs and household craftsmen? If it is the former, then the 
problem will probably be resolved on the side of durability and safety with 

Ka slightly higher price. If the latter, then, by all means, the'decision will 
probably be to minimize cost and sacrifice durability and, to some degree, 
safety. But then again, the market decision ''might be to "go for" the whole 
range of the market. Produce a variety of designs representing the full 
range of choices revealed by the analytic exercise. Something for 
everybody! 
These activities are roughly" analogous to the distinctions I want to 
make in the case of policy. Merely setting forth the marginal costs and 
Wnefits of a range of choices is one thing— political analysis. Selecting 
or^ balanced choice or a range from within the possibilities is another 
thin^— policy formation. The decision as to which choice or choices will be 
made is still a third— policy decision. And performing the market analysis 
needed, for that decision is yet a fourth— political analysis. 

For, example, suppose we entertain the prospect of distributing 
educational assistance to students in preference to institutions and that we 
are resolvejj to do so on the basis of financial need. In that case we require 
access to financial information, and not simply' on groups, trends, or 
categories of persons J but on each actual individual. If we propose this 
kind of policy as more just than other choices, then, in the name of justice, 
individuals will have to reveal personal information that may have been 
regarded before as privileged. Two values conflict. We extend justice, but 
diminish, in some nieasure, privacy. To secure a definable gain in one, we 
pay a definable co^Jt in the other. Policy analysis asks, "What is the net 
marginal gain?" A tr^uly refined policy analysis, which rarely exists, would 
tell us how much we are likely to gain in the advancement of justice for 
some corresponding cost in privacy. But no such analysis, no matter how 
refined, will teU us whether it ,is worth it. In order to resolve that 
question, we need something corresponding to a market analysis and a 
market decision. We need a political analysis and a political decision. 
"Policy*' implies "polity" arid "politics" just as "good industrial design" 
implies a structure for marketing analysis and marketing decision. 

But consider another example. A Congressman asks whether 
pass-through requirements for allocating Title I (ESEA) funds should rest on 
tests'* of educational need rather than economic need. The answer comes 
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back couchod not in terms of "whether we should" but in terms of "what 
happens if we do?" That's policy analysis. In either case, the funds would 
go roughly to the same school districts—but not quite. What's the margin 
of "not quite"? Is "not quite" "very much"? Is it "enough to matter"? And 
even if it is "not much"' would the change create incentives for local 
districts to pay more attention^ to "educational need" in answering 
allocational problems? And if so, then would the incentives be enough to 
make a difference? And if so, then (here we are again) how much of a 
difference? That's policy analysis. 

But .policy analysis does not, and need not stop there. It can ask not 
simply what the net consequences would be of doing X, but what those net 
consequences would be compared to doing Y, where Y is either what we are 
doing already or some third alternative.l3 x^e question for policy 
analysis is not whether doing X is a net improvement over doing Y~better 
than doing Y— but simply, what are the net effects? Whether it is better to 
have a drill of low cost instead of high durability wiU not be determined 
simply from an analysis of the trade-offs. It requires a marketing 
decision. Similarly, whether given the different consequences, it is better 
to do X than Y in public policy will not be determined by a policy analysis. 
It will be determined by a political decision resulting from a political 
process involving a political analysis. 

In * short, policy analysis is that rational, technical, analytic 
performance in which the central question is not whether X is^a good thing 
to do, but simply what are the marginal effects of doing X, and what are 
the marginal effects as contrasted with doing something else instead? 
Hence, policy analysis is simply an activity whose theory is the theory of . 
marginal utilities . It is, by all accounts, an activity that consists in . the 
.exercise of theoretical, rather than practical, rationality. It assumes that 
the policy question is "nested" in a conflict of values present as objective 
states of affairs in the society. It is an activity in which evaljjators may 
fake a leading role provided that they do not suppose they are actually 
evaluating policy, as opposed to merely recording— either in prospect or in 
retrospect— the consequences of doing X or Y.^^ 

Policy Formation 

Policy formation is an activity "of a contrasting genre. Policy 
formation is lhat_activity by which we seek to gain agreement oh what 

Ea specific policy can or will take, as opposed to what form it ought to 
. Not even by the most refined policy analysis wiU we have actually 
ed a policy statement. Indeed, policy analysts are not typically in a 
position ,to actually formulate policy.^^ For the latter, we need to 
engage in conversation, persuasion, argument, and in (seemingly) endless 
meetings with those who will actually pen the regulation, mark up the bill, 
establish the procedures, write the guidelines, etc. The theory of policy 
formation can then be discerned as one aspect of the theory of government 
management and rhetoric' . At the Federal level, it usually turns out to be 
the theory of inter-agency politics. "Don't fight over turf; just take, up 
space" is a rule for the conduct of policy formation, I include here the 
theory of rhetoric because clearly it makes a difference what things are 
called. The same policy that under one name may never see the light lof 
day will, under another name, pass without objection. Calling it "school 
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aicf* may defeat it; but calling the same thing "national defenseVmay 
insure its acceptance. "If it matters what you call it, then call it 
something that matters" is another guiding rule in the theory of policy 
formation.l^ 



Policy Decision 

\ *. . ^ 

Policy decision can be described as the authoritative-action of some 

office, administrative or legislative, by which a line of action, for the 

moment at least, is established. Policy decision is not so much an activity 

or process as it is a momentary end point in the contiri^uing bu^ness of 

governme;it. It is that end point that is sometimes supposed by pe naiye 

to capture the entirety of the policy process— as though making 'policy 

could be reduced simply to an act of wUl^r the result of divinatiog.' The 

theory of policy decision is simply the theory of the policy itselfi ^It V-the 

political ajid legal theory by which authority is^djstributed, obligations for 

de^on are assigned throughout the sti;uotuR^-^ political institutions, and 

agents of authority are enjoined tp act. 

Political Analysis 

Unlike policy analysis, political analysis is concerned not with 
determining the net benefits of a given course of action, but with their 
political weight. 17 xhe aim is not so much to determine the net social 
benefits of a particular policy, but to determine its constituency. If policy 
analysis is concerned with establishing what is good, then political analysis 
is concerned with estimating who will vojte for it, whether the best thing ^ 
do is the same as 'the best thing that can be done . Hence, the theory or 
political analysis is the theory of political behavior* ^ 

We may gather these thoughts^ j^gether in a brief culminating 
summary. The theory of policy analys^is the theory of marginal utilities. 
It establishes the set of policy chofees.^^ The the^.y of policy formation 
is the theory of inter-agency politics. It is the governmental process by 
which a course of action comes to be selectejl. The theory of policy 
decision is nothing less than the theory of the policy itself, and the theory 
of political analysis is the theory of political behavior. When we view all 
of these activities together, not as discrete steps in the policy process, but 
as distinct facets of a social process— not one feature-predominating and 
now another— then we can discern more clearly when the professional 
practices of evaluation fit and what their relevance is to the creation, 
promulgation and implementation of public policy. 

Evaluators and evaluation can contribute to each of these activities, 
but not ^o each in the same way. For example, the rational standards of 
policy analysis are the standards of theoretical reason, but the rational 
standards of policy decision and political analysis are the standards of 
political judgment. These are practical activities. This difference may 
help to' explain why it is that when the question, "What should we do?" is 
given a policy analysis^ we may get one answer, and when given- a political 
analysis or .when rendered in a policy decision, we, may get an entirely 
different answer. In short, the exercise of political judgment is a practical 
activity. It is also an evaluational activity. But the result of that activity 
may differ from or even contradict the results of policy analysis. What we 
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should do— even the best thing to do— may turn out to be one thing by 
policy analysis and a very different thing when it comes to political 
decision. 

The professional evaluator can contribute in the context of 
gover nment, but h e will contribut e to all of these activities only to the 

"extent~that the evaluator becomes also a poITtTcian and"a polTtical advisor. 
Consider, for example, the case of policy formation. The evaluator, as 
professional, can contribute, but that contribution will be most substantial 
to whatever extent he becomes a student ,pf bureaucracy and a trusted 
counselor to authoritative leadership. 

So thg dilemma is this. Each of these activities involves evaluation in 
some broad sense of the term* Each involves evaluation in the sense, say, 
that buying a camera does. But only in the case of policy analysis is the 
evaluator's role, as professional, undilut^ by the need to take on other 
roles. The evaluator, as evaluator, is likely to make a contribution only to 
the conduct of policy ajnalysis. Bat in government as elsewhere, the 
possession of knowledge can bring with it a certain kind of pow|er. To the 
extent that the evaluator goes beyond his professional practicp and with 
superior knowledge also earns the confidence of political leaders, exercises 
political judgment, ^nd acquires the additional skills of fit practiced 
political observer of the present, bureaucracy and an uncertain future, then 
he will contribute to every facet of the policy process. But fn doing ^o, he 
will also become less an evaluator in any limited professional sense and 

^ore a political leader or publip servant in a quite old-fashioned and 
conventional sense. His main characteris;tic will not be the possession of 
technical skill. It will be the possession of <3ivic virtue. 
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FOOTNOTES 

The footnotes which follow are citations of pointsl in the six SEA case * 
report chapters that either provide the practical illustration of points made 
in my i^nceptual analysis or have actually provoked ^points made in the 
analysis. The underlying principle is that a philosophical analysis of such 
matters should, on the one hand, illuminate what practitioners do and say 
about their work; but, on the other hand, it should also arise from a serious 
study of what they say and do. Philosopihy, An this sense, is simply the 
explication of everyday life. In the footnotes that follow, reference is 
made to the preceding chapters by the authors' last names. \ 

l^Bracey gives about as. good a review of the confusion surrounding 
the term "policy" as I have seen. He, together with Gold and Sandifer, 
observes that policy is sometimes framed without Vclear, procedural 
implications" and also that procedures are sometimes viewed as policy.. 
Ascher tends to see policy as pretty much limited to authoritative rules of 
procedure, and nearly all of the authors, at one time or another, speak of 
policy as something akin to "personal policy of a person in authority," 

It has not seemed to me fruitful to aspire afte- too strict a definition 
of "policy," But to see what it means in the context of practice might be 
useful. Hence, I have tried to avoid what would appear to be a fruitless 
"academic" exercise of little practical significance by asking "What 
constitutes a policy question?" instead of "What is policy?" 

2do all policy questions have these features? I think they do, but 
the SEA chapters do not clearly reveal that fact. On the contrary, it is 
pretty difficult to find any policy questions really carefully formulated in 
these chapters. What one finds, by implication, is a range of descriptions 
of technical and political problems surrounding some significant events in 
the. history of State Department activities. The values in conflict are 
almost never fully drawn out. But they can be discerned inductively at 
work in the narratives provided. 

For example, both Bracey and Gold remark, but in different. ways, that 
when .there is consensus, it can be decisive in resolving policy questions. 
But they also indicate that, in effect, this proviso, amounts to saying, "If 
all people agree on what is to be done, then it is no longer a policy 
problem"^ (Bracey, p. 14). The implication is that a conflict between values 
in which the policy issue resides is an essential feature without which such 
questions would riot be serious. 

Sometimes we observe from the narrative the points at which the 
"nestfed values" of the policy question begin to emerge through time. (See 
BraceV, pp. 26-27, where they begin to emerge, but do not ever take the 
shape of a well-formed policy question. See also Bracey, pp. 28-29. See 
also Donovan and Rumbaugh, pp. 39-41, where it becomes apparent that 
the policy issues arise as larger numbers of goods are permitted to enter in 
conflict through constitutional change and as their presence becomes more 
evident. Again, the same is evident in Donovan and Rumbaugh on p. 48, 
and pp. 55-56.) 

' This ieature of policy questions generally is also expressed in Gold, p. 
120. It is the essential requirement that he lists there as Contextual 
Factor #3. The same can be seen in the SoQth Carolina experience over 
legislation proposed in 1977. As Sandifer tells it, the policy question, to 
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the extent that there was one, was nested in conflicting values on the part 
of legislative representatives. 

^The experience referred to here is the experience over the exit 
standards at the secondary level in Florida and the account given by Bracey 
on identifying the "cut-off score" for the initial tests in Virginia. In that 
latter casje, consider the extent to which political considerations would, 
and probably should, enter quite beyond any considerations of technical 
decision, and the ways in which the necessity of those considerations 
reflect values in conflict /and define the policy question. In short, as 
Bracey describes it, the policy question was not really whether there would 
be a cut-off score (or what it would be) but how to strike the appropriate 
balance between conflicting values at issue in that decision. The result 
was initially a decision vvithout dat^:. There followed a readjustment of 
that decision, and it ended up being very different in appearance, but .not 
very different in consequences,' I suspect from what existed before. (See 
also Sandifer, pp. 97-98.)/ 

^Is there any evidence in these papers of the assumption that people 
would agree on policy if they could only agree on goals? I am not sure. It 
is interesting, however, that those papers in which policy is seen to be most 
nearly associated with management, monitoring, and issues of compliance, 
(Donovan and Rumbapgh, Ascher, and Rasp) are also the papers within 
which .issues of policy are most explicitly perceived as issues of 
management and administrative guidance. 

^This observation running through this chapter is clearly recognized, 
though not this explicitly, in Sandifer in his extended comments on the 
problems of framing evaluations as though they were addressed to 
academics. (See Sandifer, pp. -94-95, and again p. 96.) But this point is also 
explicitly; addressed in Bracey. 

^In Bracey's account of the program in Virginia, he provides clear 
instances in which policy decisions are made without data, with the 
suspicion that decisions might have been better with it, but with another 
suspicion that they might not be different in either case. Still, the view 
prevails bojth in ;Bracey an6 in Donovan and Rumbaugh that the possession 
of such data not^only makes decisions more rationally persuasive, but may 
be necessary because of statutory requirements and for allocational 
decisions even when it does not produce a different result. 

'^This notion that a well-formed policy question always contair^ 
these kinds oi questions is not something that is well displayed in th^i 
preceding chaptjers. It/ would be interesting to take either a truly serious 
policy question (such /as is implied in the personal goals of the SPI in 
Washington or\issues/ of remediation as displayed in the behavior of the 
system in Virgi^^ia) or a matter of procedure as implied by policy decisions 
in other of these papers, and really examine what the policy question is, 
rather than, as lis usually the case, consider merely the result of a policy 
decision and describe its operation. 
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^Can one find in these six chapters the residue or evidence of 
Utopian thinking? What would it look like? Well, for one thing there is in 
general a failure to take seriously in these chapters the presence or the 
need for a political process. Where the presence is acknowledged, as in 
Bracey, Gold, and Sandifer there is also, it seems to me, a general failure 
to take th^rt role seriously. Exception would be the Virginia paper. But 
Sandifer pl^esents an interesting test of this. On p. 96, he explicitly 
acknowledge the necessity of taking the political process into account in 
trying to understand the role of evaluation. But he regards such influences 
as "external" (to the Department, I assume) and not as essential. The same 
ambiguity is expressed on p. 93. 

Bracey^s paper presents another interesting, sensitive and 
sophisticated example of this kind of acknowledgement. But I 
wonder— would it be Bracey's opinion that things would be better on the 
whole if such "non-logical" behavior were not so influential? It would be 
the Utopian impulse to say "Yes" and stop there. I suspect that Bracey 
(also Gold and Sandifer) would go on to answer that question with a "Yes, 
but . . . 

^{ think it would be interesting to consider v/hat features of 
evaluation within the operations of government regard the preservation of 
politics as something of an intrusion. Why is it not the view of evaluators 
that, after all, the play of political forces Js the primary and only essential 
method of evaluation that we really have to preserve? 

I believe that the answer to this question is that evaluators share a 
kind of Utopian vision in which rational decisions replace political 
decisions. That is to say, the maintenance of politics is seen as an obstacle' 
to the conduct of effective evaluation and an obstacle to making 
ev6duation contributions to social decisions. In short, politics tend to be 
viewed as replaceable by management and management tends to be viewed 
as something that' should be guided by evaluators. 

^^The chapters almost uniformly testify to the claim that the role 
of evaluation in state departments of education is to serve the ends of 
management and to keep politicians out of trouble. (See Rasp especially, 
on the last point.) This tends to be viewed by evaluation theorists as a 
defect, but it is usually viewed by those in state departments as their, 
normal, natural, and rightful role. \ 

^1>1-2aii of the papers, with the possible exception of Ascher, 
comment on the stringent boundaries of time urider whfch evaluators in 
state departments operate and the ways in which thisXfact marks a 
substantial contrast between, what has to be done in thev context of 
government and what can be done in the academic setting. Clearly, they 
are in agreement that this is one of the major differences that Mends to 
make evaluation theory of little relevance to evaluation practice, at least 
as it occurs in State governments. ^ 

13it is interesting, I think, that in none of these chapters is there 
any detailed story about framing a 'better than" kind of judgment in the 
case of policy analysis. Rasp remarks that the section in Washington State 
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never performs policy analysis. The same could be said of Ascher's 
, account. But this is probably an expression of the fact that well-formed 
policy questions seldom surface in the context of State educational policy. 
That is, as I have observed already, the full explication of goods in conflict 
virtually never comes to the surface within state governmental affairs. If 
it did, as a matter of course, then there would have to be stories about 
"better than" kinds of judgments in the accounts given. 

^^Policy analysis necessarily involves the study of marginal utilities 
and really excellent policy analysis (which seldom occurs) would require a 
study of net marginal utilities. One step in that kind of analysis will 
always be simply "finding out what happened"; in other words, it requires 
the study of effects. Note: "Study of effects" is what nearly all 
experimental and virtually all intervention designs are about. Effects can 
be "greater or less" in one area than another, but the "study of effects," 
though an essential step, will never add up to a study of worth. Ascher 
gives a good illustration of this when he describes the circumstances in 
Oregon by which any decision of worth is "kicked downward" in the 
system. Studies of "effects," though essential as a step in policy analysis, 
remain nevertheless only a step. Evaluators in state departments 
contribute to the satisfaction of this step, but they cannot, by that means 
alone, complete a policy analysis. 

l^None of the chapters reflect the occupancy of a position 
sufficient to determine the actual formulation of policy. 

l^The theory of policy formation could be described as the theory 
we use to predict the behavior of political and bureaucratic leaders. All of 
the chapters, but especially Bracey and Rasp recount the significant 
difference that is created by the personalities of bureau leaders. 
Evaluation practice is substantially governed by the behavior of political 
leaders. So the theory of policy formation, as the theory of inter-agency 
politics, is likely to be expressed in the descriptions that political officials 
give and the descriptions that their subordinates give of their own personal 
qualities, and their own personal aims as actors in the political arena. 

^'^ Notice that "political weight" is different from "rational weight." 

^^The set of policy choices! is also established by what are 
sometimes called "peremptory rules." (See Braybrooke and Lindbloom, A 
Strategy for Decision .) Such rules tehd to establish the moral limits within 
which policy can be selected, but, at the same token, they tend to 
guarantee that policies are to be chosen from among alternatives, all of 
which are morally permissible or have worth. In that sense, defining the 
set of policy choices is the expression of moral conviction and value 
estimations, but selecting from within that defined set is not. This is 
partly the reason why, with even the best evaluation data, we are unlikely 
to arrive at policy choices that are substantially different from those we 
would arrive at without such data. The range of choices is already 
substantially set by considerations that define the set of alternatives, and 
those considerations, being peremptory, do not permit a very large range of 
differences to arise. 



CHAPTER 8 

The Context of Evaluation Practice in 
State Departments of Education 

Nick L. Smith 

If you are going to develop a theory in evaluation, you had better 
know what^s really going on in evaluation. You have to know the 
studies and you have to know the tradition of the people who are 
in the field (House, 1979, p. 150). 

In spite of considerable writing on the theory of evaluation in 
education, there has been little study of actual evaluation practice, House^s 
admonition to the contrary. There is almost no writing in the professional 
literature on what evaluators actually do or on the nature of the settings 
within which they work, especially for evaluators within local school 
districts and state departments of education. In fact, most writers on 
educational evaluation are university-based researchers who tend to 
participate in evaluations as third-party consultants, only infrequently 
conducting evaluations from within bureaucratic settings. Rich (1979) 
suggests that new information is accepted into an academic setting much 
differently, however, than it is into a bureaucratic setting. Academicians 
are likely to welcome non-threatening information, if it is received at 
minimal cost, but bureaucratic managers are more likely to expect that the 
information be relevant to their own settings and to be concerned about 
what the provider of the information may want in return, how the 
information may embarrass the agency, and so on. The context within 
which evaluation takes place shapes the nature of evaluation practice. I 
believe that current evaluation theory is handicapped by not being 
sufficiently grounded in this context. 

Discussions of methodology in the evaluation literature rarely include 
a consideration of the context of evaluation practice. One might surmise 
that evaluation methodologists assume that 
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1. The context within which an evaluation takes place is 
relatively unimportant (there is seldom a discussion of the 
influence of context on the selection or use of methods). 

2. Methodological concerns are of prime importance in 
evaluation work (many authors discuss nothing but 
methodological concerns; economic, political, organiza- 
tional, or legal concerns are seldom moitioned except in 
respect to the use of evaluation results once they are 
produced). 

3. Methodological decisions are determined on the basis of 
technical considerations (the influence of legal, political, 
economic, and organizational factors on methodological 
details is seldom mentioned). 

4. Evaluators have sufficient autonomy and independence to 
decide methodological issues as they see fit (again, one can 
find little literature that suggests otherwise). 

Although it would be difficult to find any writer who has explicitly stated 
these assumptions about evaluation method, it is also difficult to find 
writing which discusses under what forms of evaluation practice these 
assumptions do not hold. I believe that the "testimony" provided by the six 
case report chapters suggests that, at least for many evaluations conducted 
by state departments of education (SEA), these assumptions are not valid. 

The information presented in this book may not strike some who have 
worked in" state departments of education or who have studied the 
educational policy scene at the state level as particularly insightful; 
however, that group of individuals does not include many evaluation 
practitioners and evaluation theorists. It is for this latter group, those 
with little direct or vicarious experience with state level evaluation 
operations, that this volume has been written. My purpose in this chapter 
is not to chart the formation of state level policy nor to systematically 
study the operation of state evaluation units. My intent is simpler: merely 
to draw on the preceding six case reports to illuminate the range of factors 
which influence SEA evaluation practice and which often predetermine the 
methodological decisions of practitioners in those settings. 

What is known about the nature of method and the context of 
evaluation practice in state department settings? There have been a few 
recent studies of evaluation practice in education, especially at the local 
education agency (LEA) level, (cf. Alkin, et aL, 1979; Lyon, et al., 1978), 
but discussions of evaluation procedures continue without .their being 
related to the context of evaluation practice. For example, in a recent 
article, Stufflebeam and Webster (1980) review thirteen alternative types 
of evaluation without touching upon such issues as who uses the various 
approaches, under what conditions, and with what success. Taxonomies of 
evaluation methods would be much more useful if their relevance to the 
settings of, evaluation practice were explicated. 

A recent study (Caulley and Smith, 1978; Caulley and Smith, 1980) of 
evaluation practice in state education agencies (SEAs) highlighted the 
great variability in evaluation practice at the state level. For example. 
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whtte some state departments of education perform all state-level program 
evaluations by using inhouse staff, other state departments subcontract 
almost all such studies. Further, some state departments restrict their 
activities to state-level testing, while others engage in a wide range of 
activities from testing to research to school accreditation to evaluation 
monitoring to policy formation. The organization of evaluation units also 
varies; some states have centralized evaluation units, others do not. What 
accQunts for the differences in the structure and function of evaluation 
units within state departments? What influences the nature of evaluation 
practice within these settings? Much has been written about the impact of 
evaluation on policy, but how does policy influence evaluation? These are 
some of the primary questions addressed in this volume and are the major 
focus of this chapter. 

Since the nature of SEA evaluation operations varies dramatically 
from state to state, and can vary radically over time within a single state, 
I am less concerned here with the details of what the current SEA 
evaluation environments are like, as represented in the six case report 
chapters, than I am with what general factors shape thesa environments. 
In the analysis which follows, I attempt to synthesize the individual case 
reports into a broad description of what affects the context of evaluation 
practice within state departments. When it seemed appropriate, I have 
used the authors' own words so that the reader can make his or her own 
assessment of the fidelity of my synthesis. 

In the first section of this chapter I will discuss five sources of 
influence on evaluation practice in state departments of education: the 
influence of the federal government, the influence of state governments, 
the influence of the state agency itself, the influence of local school 
districts, and the influence of other groups. The focus of this discussion is 
on how these various groups and agencies shape evaluation practice. As 
illustrated below, some of these influences arise through formal decree, 
others arise through agency custom which defines de facto policy and 
thereby limits future methodological options. Many factors influence the 
operation of a state agency evaluation unit, including: state and federal 
laws and regulations; formal policy statements and position statements; 
administrative rules and procedures; standard agency practices; resolutions 
of controversies, hfearings, and lawsuits; statements of special interest 
groups; budget reviews; the desires of key personnel; and so on. 

In the final section of the chapter I will summarize the view of 
evaluation practice presented by these six case reports and discuss the 
implications of this view for the improvement of evaluation practice and 
theory. 



INFLUENCES ON SEA EVALUATION PRACTICE 

Influence of the Federal Government on SEA Evaluation 

As the agencies responsible for state compliance in education, the 
state departments of education have always been affected by federal laws, 
regulations, and court decisions. However, the Elementary and Secondary 
Education Act of 1965 mandated a more active role for SEAs in the 
evaluation of educational programs. In the early years this requirement 
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was seen by> SEAs as a federal "reporting" activity required for receipt of 
federal funds. Subsequently, however,, some SEAs used the ESEA impetus 
and funds to centralize program evaluation activities and to increase their 
evaluative capability. Subsequent federal legislation has specifically 
required evaluation at the SEA level, and SEA monitoring of evaluation 
conducted at the LEA level. Considerable effort is still expended by state 
departments of education, however, in preparing and filing reports to meet 
federal requirements. This is no minor task. As Rasp indicates: 

More recently with the impact. of PL. 94-142 and the mandated 
individualized educational programs for handicapped students, 
the evaluation staff has been working primarily to assist in the 
development of a computer processing system for management 
information, including an emphasis. on organizing monitoring and 
evaluating data (pp. 69-70). 

Although each state department of education has responsiblity to 
evaluate its own programs, it is often the federal influence that determines 
which programs are ultimately studied. 

. Each (state department) office which funds programs operated 
by/ school districts has, or assumes, the responsiblity for 
evaluating, or monitoring the evaluation of, all programs which 
it administers. The determination of which programs are 
actually evaluated is, more of ten than not, a function of federal 
mandates. The offices most heavily impbacted by federal 
mandates for evaluation are Federal Programs, Adult Education, 
Vocational Education, and Programs for the Handicapped 
(Sandifer, p. 86). 

Of course, it is in the evaluation of Title I programs that the federal 
influence has probably been strongest. Gold, writing from the perspective 
of the Wisconsin State Department, recounts the sequence of events as 
follows: 

One of the key areas of (federal) influence is in evaluation of : 
" programs. A prime example is Title I, in which evaluations have 
gone from locally designed to more rigorous federally mandated 
requirements., At the outset, local districts were permitted 
great leeway in how their program evaluations were designed and 
implemented. The goals and objectives were strongly 
encouraged and the evaluation instruments were left up to the 
local district. As a result, evaluations ranged from excellent 
through adequate to totally inadequate. As the inconsistency in 
the quality became more evident, the federal guidelines for Title 
I evaluations were tightened to make them more consistent with 
good evaluation practice. 

In addition. Congress began to seriously question the 
effectiveness of Title I funds. This questioning led to changes in 
evaluation requirements and subsequently to' evaluation 
practices. When faced .with measuring the 'impact of Title I, 
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educators were unable to aggregate Title I evaluation data from 
across the nation and gauge their effectiveness. Instead, case 
studies and anecdotal data were used to defend or attack the 
massive expenditure of funds. Subsequently, .Congress mandated 
that a method be developed to report on the impact of Title I to 
Congress. 

.As a consequence, , Title I developed four models for evaluation 
which generated data that could be aggregated at the state and 
local levels. This strategy limited the evaluation instruments 
that were required (LEAs could supplement), the sequence of 
evaluation events, and, to a certain degree, the content required 
to be evaluated. In essence, the Title I requirements limit the 
required evaluation strategies, curricular content, and test , 
instruments .. /(Gold, p. 106). 

This extensive federal involvement in SEA evaluation, even to the 
point of specifying evaluation procedures, does not seem to be lessening. 
The U.S. Office of Education is currently developing additional .evaluation 
models for use with migrant programs and with programs for neglected and 
delinquent cHildren, as the office has already done for Title I programs. 

The federal influence also can carry over into nonfederal programs, as 
Sandifer reports has occurred in South Carolina. 

Beyopd the effect that specific program policy, e.g.. Title I, may 
have on evaluation methodology as applied to that program, the 
' effects often carry over into other programs. For example, the 
comparability and nonsupplanting requirements of Title I, 
coupled with the Office of Civil Rights regulations which 
prohibit grouping that r^isults in the formation of racially 
identifiable classes, virtually prohibits the use of experimental 
or quasi-experimental design in evaluating programs that may 
have little, if any, relationship to the federal programs which 
have placed constraints on evaluation in general (Sandifer, p. 90). 

There is little doubt that past federal legislation has not only 
increased SEA activity in program evaluation, but has of^en determined the 
nature of those SEA activities. Many SEAs spend considerable effort in 
complying with federally mandated evaluation requirements for federally 
funded programs. The federal agencies tend not to explain how they intend 
to use the required data nor do they report back on their analysis of the 
data. In the view of some observers, seldom have the results from these 
federally mandated evaluation activities contributed directly to the 
improvement of SEA operations. 

Because of the visibility of federal programs, many evaluators are 
already aware of their influence on state department evaluation 
operations. The influence of state governments on SEA evaluation 
operations is less generally known, however, and that is the topic to which 
we now turn. 
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Influence of State Governments on SEA Evaluation 

There are various elements within state governments which influence 
SEA evaluations; a discussion of the major elements follows. 

The state legislature, and the Governor's office, frequently have 
profound impact on the nature of evaluation operations within state 
departments of education. In some cases the state evaluation unit cannot 
collect data unless such collection is prgviously authorized by the 
legislature. The legislature even mandates the definition of evaluation: as 
in Virginia, for example; specifying that evaluation is to mean testing, with 
sole reliance on test scores (see Bracey), Even technical decisions, such as 
setting cutoff scores, can be elevated to the level of policy questions fiind 
incorporated into state legislation, (On the other hand, politically sensitive 
decisions that no one wants to make may be left out of policy \ 
deliberations, to be handled subsequently as "administrative considera- 
tions,") 

The special interests of certain legislators and the climate of public 
opinion can be very influential in the drafting of evaluation-related 
legislation, such as the state accountability laws, the statewide testing 
laws, and the state minimum competency graduation requirement laws. In 
many cases the state legislatures have not only mandated such statewide 
programs, but have indicated what subjects were to be tested, at what 
grade levels, how often, with what types of tests, and how the results are 
to be reported. Similarly, in some states, legislation has specified which 
tests were to be used in the certification of teachers, and which to be used 
in teacher training and teacher review. In such cases, traditional concerns 
of m'easurement and research design have been made moot through state 
legislation which mandated answers to such issues. Regardless of the 
technical adequacy of these legislated decisions, SEA evaluation units are 
forced to employ such methods or fail to comply with the law. In other 
cases, legislation mandating evaluations is not so prescriptive, and results 
only in general guidelines which must be followed by SEA, evaluation units. 

Gold talks about the ways in which state legislatures influence 
evaluation within SEAs, 

State educational policies affect evaluation in three ways. First, 
the policies may determine that no evaluation take place. This 
is usually accomplished by leaving the requirement for 
evaluation out of legislation and the budget. Thus, the program 
is implemented and a general fiscal accounting is done, but no 
performance evaluation takes place. 

Second, legislation and/or budget documentation may be very 
prescriptive in determining the evaluation of policy and 
procedures. In such cases the evaluation requirements are often 
spelled out in detail regarding (1) process, (2) instruments, 
(3) timelines, and (4) reporting requirements. This situation 
places severe limitations on the SEA, but increases the 
probability that the legislature will have its evaluation ^icy 
carried out. 
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Third, the legislature/budget requires that the SEA evaluate 
specific programs that are being supported by state funds. The 
directive to evaluate is often vague, sometimes ambiguous, and 
always open to interpretation as to legislative intent. 
Procedurally, the CSSO [Chief State School Officer] assigns a 
program to an individual to administer, ^ The nature of the 
program and evaluation will then be determined by the 
personality, politics, and program priorities of the program 
d\«:'ector and his or her superiors who have veto powers (Gold, 
pp. 107-108), 

Subcommittees are often a key element in this legislative influence on 
evaluation. Bracey discusses the role of legislative subcommittees in 
writing the "standards of quality" used lo assess education in Virginia, in 
which the subcommittee shifted the focus of attention from educational 
inputs to educational outcomes, stipulating testing procedures which 
required feedback to individual teachers on the performance levels of 
individual students. These standards thus codified the Back to Basics 
Movement for Virginia and provided for . . perhaps the most ambitious, 
comprehensive program of diagnostic testing in history" (Bracey, p. 22). 
How were these standards, which had such tremendous impact on 
evaluation within the Virginia State Department of Education, produced? 

• • . While certainly the standards, conceived entirely within the 
General Assembly, are not to be taken lightly— the objectives 
and commensurate testing program are at present in 
place— there is good reason to believe that the legislature was 
not fully aware of the implications of what it was doing. The 
committee had been advised by one legislative aide, untrained in 
psychology or education. While that aide read a great deal of 
background research, and while the. committee as a whole 
learned a great deal about testing, the final report of the Joint 
Subcommittee is a melange of the Zeitgeist, theory, errors, and 
naivete . . . The Department of Education's involvement in this 
policy change was nearly nil. Indeed, the Department had been 
operating independently on its own initiative. The Director of 
Program Evaluation had convened in 1975 a State Testing 
Committee made up of Local Education Agency (LEA), State 
Education Agency (SEA), and Institutions of High Education (IRE) , 
personnel to propose a comprehensive testing program for the 
state. Much, though not aU, of their work was rendered moot by 
the actions of the legislature. (The State Testing Committee did 
not make its final repprt until December, 1976, some nine 
months after the action of the legislature,) (Bracey, pp. 22-23) 

It was thus through these procedures that the state legislature produced 
standards of quality for Virginia education which effectively defined 
evaluation in Virginia as synonymous with testing. 

Obviously these "legislature-designed" evaluations can create 
considerable difficulties for SEA evaluation personnel who must attempt to 
perform such evaluations. These legislative mandates may change the 
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nature of evaluation for a given program or have an even more wide 
ranging impact. For example, Donovan and Rumbaugh discuss how a 
change in the Michigan State Constitution altered the authority and 
membership of the State Board, The new members brought greater 
interest in evaluation and ah action orientation which radically changed the 
nature of SEA evaluation operations throughout the state. 

One of the common problems encountered by SEA evaluation units is 
that legislatively mandated evaluations frequently involve short timelines. 
The lack of tin(ie to conduct adequate studies results in an SEA focus on 
short-term outcome studies instead of longitudinal attempts to look at 
long-term impacts. State legislatures may also mandate that SEA units 
monitor and certify the performance of LEA's evaluation activities, thus 
forcing the SEA evaluation unit into a monitoring or audit role rather than 
an evaluative role. Some laws specify certain types of correlational 
analysis which dictate evaluation methods that wash out important 
individual differences (see Rasp), In other cases, rules regarding the role 
of such support services as the state printer can discourage the use of the 
most appropriate reporting formats. Further, the need to use evaluation 
data to justify the continuation of desired educational programs within the 
legislatively approved budget influences the nature and timing of 
evaluation studies, - ^ 

One of the major difficulties arising when legislatures specify 
technical details within evaluation legislation is that sometimes the legal 
requirements for evaluation exceed the evaluation technology currently 
available, 

For the past two years the State Board of Education has been 

required by legislation to certify education clinics organized to 
provide programs for school dropouts, and the (State 
Superintendent) has been required to manage the funding process 
and to evaluate the programs. Because of the special legislative 
interest, the activities are politically sensitive beyond the small 
amount of money involved. The law itself calls for the 
evaluation of superior performance based on educational gain as 
related to the difficulty of educating the students and efficiency 
in terms of per pupil expenditures. The demands for evaluative 
precision outstrip the current state of the art (Rasp, p, 70), 

A related problem occurs when social and educational goals are 
combined within the same piece of legislation, ^ 

[Some] programs have a mixture of social action and education ■ 
priorities. For many reasons, caTegorically funded programs \ 
often have a multiplicity of apparent purposes; some establish \ 
primarily education priorities while others establish primarily 
social action priorities. For example, legislation may contain 
language which seems to equate civil rights and basic skills 
education. It is not uncommon for these social action and 
education priorities to be so closely intertwined that it becomes 
virtually impossible to distinguish among them (Donovan and 
Rumbaugh, p, 53), 
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Legislative disappointment and frustration with the state evaluation effort 
surfaces when legislators learn that social action objectives cannot be 
assessed by educational performance measures. 

It should be noted that the legislative influence in SEA evaluation 
operations is not always an externally imposed influence since SEAs often 
seek legislation authorizing them to perform certain actions and providing 
them with funds. For example, the Michigan State Assessment Plan was 
initiated by the State Superintendent and included in the Department's 
appropriations bill in 1969 as a result of the Superintendent's efforts. The 
original proposal made to the Superintendent was drafted by SEA 
evaluation unit staff. State department of education evaluation units often 
participate in the preparation of legislative bUls, since that is frequently 
the only way to obtain funds and authorization to perform the work they 
seek to do. Thus, while state legislative mandates can dramatically affect 
the nature of SEA evaluation practice, SEA evaluation units also seek to- 
influence their owr\ operations through the creation of legislation. Many 
SEAs are active participants in both state and federal law making. State 
legislation concerning evaluation often results because the SEA won or lost 
a battle and not because of a lack of interest or involvement in the law 
making process. 

Individual legislators can significantly affect evaluation practice 
through personal contact, as well as through formal legislation. The 
political acceptability of evaluation ^methods is crucial for the successful 
operation of SEA evaluation work. The influence of legislators in technical 
decisions is evidenced in the following quote from Donovan and Rumbaugh. 

Some groups saw the questions in the "General Info;:mation" part 
of the [Michigan State] assessment as unrelated to the purpose 
of assessment of the basip skills, and even worse, an invasion of 
personal privacy because of questions asked as proxies for 
scoio-economic status. The press picked up the complaints of 
educators and parents, and then legislators got into the debate. 
Department staff spent consideirable time and effort explaining 
the need for these data, and defending their collection. Finally, 
as time passed and' other issues arose, the controversy abated, 
but was to re-arise each year until the State Board in 1973 
directed the State Superintendent to eliminate the 
socio-ecoomic status feature. It was recognized that these data 
were valuable for the proper analysis of the basic skills 
assessment data, but it was just not politically viable to kee^ 
this instrument as part of the program. The policy decision^^^fy^*^ ^ 
eliminate it was made on political rather than technical grounds? 
. . . Another controversy the first year was raised by legislatols 
at the request of their constituents. They attacked one of xxs^ 
reading passages in the test because it was, "A blatant 'attemflt 
to inculcate anti-American and anti-free enterprise values Jn 
school children." The Department staff discussed these issi£s 
with the legislators and were able to avoid serious action agaiillt 
the assessment program. Thq compromise solution included 
changing the reading passage for the next year (Donovan and 
Rumbaugh, p. 43). . 
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The state legislatures, the governbrSs office, individual -legislators, and 
other state agencies, then, can ail influence how evaluation is performed at 
the^ state level. We next consider the influences of the state department 
itself, . ^ ' 

Influence of the State Agency oi\SEA Evaluation 

The policies and [Procedures of the state department of education in 
which the SEA evaluation unit is housed also shapes the nature of ^A 
evaluation yvork. Agency policies, regulations, rules, and procedures 
determine such things as the organizational structure of the evaluation 
unit, the size and qualifications of the staff, the communication channels 
to be used by the unit, the size of the unites budget, and the unit's fiscal 
accountability. Even the name of the unit may be changed to reflect 
shifting agency policy. For example. Rasp notes how the title of the SEA 
evaluation unit in Washington was changed from "Program Evaluation" to 
"Program Evaluation and Research" to "Testing and Evaluation" to 
"Testing*, Evaluation, and Accountability," and back to "Testing and 
Evaluation" over the years to reflect agency. policy. The agency's point of 
view also influences the stance the evaluation unit takes, the definition of 
who can legitimately evaluate inhouse programs, the ^orientation of the 
evaluation unit concerning its proper function, and who the appropriate 
clients and audiences of evaluation are. As, a unit of the state education 
agency, it is expected to reflect agency priorities and policies. 

Agency policy ^Iso determines how many evaluation staff arje to be 
permanent, and how many on special ^jrants or project money, the types and 
numbers of external grants to be pursued, the use of external consultants, 
even which professional associations the evaluation unit should maintain 
contact with and serve *as liaison to. Rasp describes the management of 
the Washington statewide testing program as foUows: 

The law is implemented through heavy reliance on contracted 
services. To accomplish major tasks such as the printing and 
scoring of tests, logistical services and analysis, requests for 
proposals are prepared and sent to interested bidders. The 
technical proposals submitted are reviewed by outside panels of 
experts working independently. The recommendations of the 
technical review t>fi^els are supplemented by the [ state 
suf)erintendent ] staff analysis of bid amounts; the 
superintendent' makes a final decision, and contracts are written 
with successful bidders.. In Washington, 'contracts for $2,500 or 
more require that a competitive bidding process is used. 
Single-source contracts for larger amounts must be justified and 
defended. In the case of contracting with. other state agencies, 
'for example, universities, educational service districts, and 
LEAs, waiving the competitive bidding process is not difficult. 
However, when agencies other than those of the state^ are 
involved great care is taken to explicitly follow the rules. 

Since the total professional staff responsible for the testing . 
acuities is' less than one full-time equivalent, contracted 
services are necessary and play a crucial role. The typical 
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pattern is one in which large contracts for specialized services 
are awarded on the basis of technical merit and competitive bid. 
The assistance of additional personnel is gained through 
contracts with the other state agencies or school districts. 
Specific tasks are completed occasionally through the use of 
single-source personnel service contracts under the $2,500 
amount. Developing work plans and time schedules, preparing 
requests for proposals, reviewing bids, writing and managing 
contracts are necessary skills for administering the Washington 
testing program (Rasp, p. 68). 

Even the flow of information, which determines the influence of and 
the support for SEA evaluation, is often dictated by agency policy. In 
Virginia, for example, the State Superintendent "signs off" on all 
information going from the SEA to the State Board of Education. 
Administrative regulations specify this procedure, as well as a procedure 
that all contacts with the state legislature must be made through a 
designated Assistant Superintendent. Department members cannot 
personally contact individual legislators. If legislators contact department 
members for information or testimony, the department members must file 
reports with the Assistant Superintendent (see Bracey). Qold also indicates 
that evaluators must be careful not to violate agency policy in terms of 
whom to talk to, about what, and when. "Likewise, the evaluator should 
not promote ideas which are contrary to agency policies, rules, or 
regulations. The intent is to educate and build confidence in the evaluation 
process" (Gold, p. 118). 

The interest of the State Superintendent in evaluation can play a 
major role in the nature of evaluation activities within the evaluation unit. 
An evaluation "advocate" can considerably increase the role of evaluation 
within the agency. 

[Our] State Superintendent during the 1970's was to be the 
driving force behind state efforts in evaluation, and personally 
used the data provided to him ... [to the Superintendent], 
evaluation was critical to managers. He was to define 
educational evaluation as "a process of obtaining, for decision 
making purposes, information concerning educational activities," 
and emphasized his commitment by saying ". . . we^re committed 
to developing educational evaluation into a fruitful and 
productive exercise. We in Michig^an are not content to treat 
evaluation as that useless exercise required from on high that 
takes time and pain to produce, but' which has very little 
significance for action" (Donovan and Rumbaugh^ pp. 42-43). 

This superintendent strongly supported the use of evaluation throughout his 
term, requiring that p^iogram administrators were never to evaluate their 
own programs, but that such evaluations were to be coordinated through 
the central program evaluation unit. He required that any item which 
included plans for evaluation, such as programmatic state plans, include a 
statement of support from the evaluation staff before being submitted to 
the State Board of Education for approval (see Donovan and Rumbaugh). 
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A state j board of education 'may also play a signficant role in 
establishing evaluation procedures. For example, Ascher details how the 
Oregon State Board mandated the policy for Oregon's minimum 
competency program, setting standards defining how evaluation was to be 
used to insure jthat schools complied with the mandated policy (see Ascher). 

• Agency policy concerning the organization of evt^luation resources 
materially affects evaluation practice. In some state departments of 
education, th^ evaluation capability is decentralized so that evaluators are / 
housed* with t^e programs they evaluate. This arrangement can result in a/ 
lack of common criteria being used in employing evaluation staff and in the 
design of evaljuation studies across the agency. It also results in variatio^i 
in staff expertise and in the quality of evaluation work, including- a lack pf 
consistency iri evaluation reporting requirements imposed by the separate 
offices on local school districts. The centralization of evaluation .6an 
result in comjnunication problems across/ the units and in evaluators being 
viewed with distrust by other agency personnel, although they may be 
viewed as being more objective by outside observers (see Sandifer)./ For 
centralized evaluation units there may be disagreement between the 
evaluation st^ff and program staff con<^erning the proper focus on program 
outcome versus program process or /on qualitative versus quantitative 
methods. Furthermore, formal agreer/ients are sometimes required which 
specify what resources will be provided by the program unit for the 
evaluation and what questions or methods the evaluation unit will/ employ 
iri its work. These arrangements /require negotiation, communication, 
clarification of separate agendas, krtd so on. Such arrange merjjts raise 
problems of bias, client resistance, and interagency competition for scarce 
resources— all factors which affect "the evaluation from its initial /design to 
the use of its results. Centralized SEA evaluation units which/ have the 
ability to ir^fluence directly the State Superintendent or to /influence 
agency policy in progrftmmatic areas naturally create interagency tension 



among the various program units. 
Another ! way thaX agency po! 



icy affects evaluation is by allowing 



evaluation stilidies to be more influenced by policy issues than by technical 
concerns. F6r example, SEA policy interests can dictate methods which 
are not complatible with the interests of the local school districts which 
manage the programs being evaluated. The state agency Is frequently 
concerned with generalizability and transportability of programs while 
LEAs are mc^re concerned with meeting local needs and assessing local 
project effectiveness. Sometimes evaluations are initiated/ long after a 
program has been running in order to determine the basis /for continued 
support or expansion of the prograhi. Whis timing necessarily requires ex 
post facto designs, and so influences the\evaluation methods used. 

As mentioned above, agency requests for immediate feedback often 
prevent the eSvaluation unit from conducting long-term impact studies. 
Even an agency policy to computerize \most evaluation /data influences 
subsequent evaluation work since it, in enect, dictates the nature of data 
to be gathered and reported, the kinds of staff skills needed, and the daily 
activities of at| least some of the evaluation staff who are required to do 
the data preparation and storage (see, for example. Rasp), j 

FinaUy, aslin any organization, the nature of agency staff positions 
can alter the eWluation effort. Whether the agency head is elected or 
appointed, whejther the top staff are Wvil service or appointed. 
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whether\he technical staff are tenured or are state employees can 
influence the susceptibility of the staff to external political influence. 
High turnover among personnel also tends to create discontinuity and lack 
of concerted evaluation effort. 

Influence of Local School Districts on SEA Evaluation 

While the strongest impact on SEA evaluation operations probably 
arises from the federal government, the state government, and from within 
the agency itself, local school districts also influence the nature of SEA 
evaluation practice. LEAs most directly influence SEA evaluation work 
through their cooperation or resistance to state evaluation efforts. For 
example, Donovan and Rumbaugh provide the following illustration 
concerning the Michigan assessment program: 

Before the test administration period was over, a group of local 
superintendents met to review this "new" state program of 
assessment. These discussions led to action by some thirty-eight 
of them. They ordered that the test answer sheets be held in the 
district and not sent to the scoring service. The press picked up 
the story and the state assessment became a big story . . . the 
program had visibility! 

After two weeks of unsuccessful discussions where state officials 
tried to convince the superintendents to send in the answer 
sheets for scoring, the State Superintendent and the President of 
the State Board of Education sent a joint letter to local 
superintendents and board presidents. The letter cited "Act 38" 
authority for the assessments, directed the submission of answer 
sheets, threatened court action, and offered to discuss the 
superintendents^ concerns. The superintendents, though 
reluctant to comply, chose not to challenge the state authority 
further. 

In the ensuing discussions, local superintendents raised several 
issues. The major issue was, of course, the intrusion of the state 
into local school affairs. Each of the seven or eight meetings 
between department staff and superintendents began with this 
issue and required a rejustification of state assessment and the 
state authority. . . . After nearly seven months of monthly 
n)eetings, the superintendents, though still not satisfied, decided 
further discussions were unnecessary. They would cooperate in 
the future, and the Department would form an advisory council 
to help form the future of state assessment (Donovan and 
Rumbaugh, pp. 45-46). 

In addition to the issue of local control, LEAs often raise concerns 
about the relevance of state requirements for local school operations. 
SEA-LEA problems arise when programs are funded on one set of criteria, 
but evaluations are mandated using another set of criteria. 
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For exanrtple, the funds are provided for reimbursement of 
program staff salaries, but the evaluation focus is on how much 
the participants achieve. This particular kind of. conflict may 
create hostility among local education agency staff who feel 
that is linfair to conduct a state-level evaluation of those parts 
of the program funded locally (Donovan and Rumbaugh, p. 53). 

It should be noted, of course, that many SEAs actively seek the input 
of local school districts in the conduct of state evaluation work. Many SEA 
evaluation units use advisory committees, discussion sessions with local 
superintendents, community hearings, and other mechanisms for obtaining 
local input. Gold comments on the nature of this local input. 

In evaluation [in Wisconsin] most advisory groups, opt for 
leaving the design and implementation up to LEAs and requiring 
as little extra work as possible to accomplish evaluation 
requirements. This position is in part due to the issue of control, 
but may very well reflect a feeling on the part of LEAs that 
evaluation data either are not, or cannot be, utilized enough to 
justify increased demands on the time, energy, and money of a 
local district's staff (Gold, p. 107). 

Influence of Other Groups on SEA Evaluation 

Other groups et\so exert influence over the nature of evaluation 
practice within state departments of education: the public, the press, and 
special interest groups all, at times, shape evaluation policy. Individual 
personal contact by opinion leaders, pressure groups, task forces, or 
advisory groups established by the governor, the legislature, or the SEA 
itself "advise" on evaluation policy and professional organizations of 
business people, teachers, and administrators frequently lobby for 
particular items concerning state level evaluation. The tenor of the time 
and public interest and support for education in general also influence the 
strategies used by the evaluation unit. On politically sensitive issues 
various non-agency personnel, such as the press, the Governor's office 
staff, university staff, legislative analysts, and others occasionally want to 
reanalyze raw data from evaluation studies and thereby seek to influence 
the release and interpretation of evaluation results. 

The desires of special interest groups can affect the climate within 
which evaluation activities are performed and can determine how data are 
collected and released and the various types of clearances that must be 
obtained. 

Donovan and Rumbaugh discuss how public reaction to the release of 
results from the Michigan assessment influenced evaluation procedures. 
The state department had promised that the test results of "individual 
school districts would not be released.* However, after inquiries from the 
press and threats from an influential legislator that he would mandate the 
release of the data and even provide guidelines for their release, the 
department recanted and released the information. "Even today, ten years 
later, the promise which couldn't be delivered, i.e., no public release of 
school or district results, is remembered by some superintendents" 
(Donovan and Rumbaugh, p. 46). At first the results were released in 
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response to individual requests, but as interest grew the department 
published the results for all districts. This occurred for two years but "the 
^heat* was too much and the results from 1973 were released on request, 
but rto compilation of all districts were released" (Donovan and Rumbaugh, 
p. 46). The use of the results was also of concern. 

The comparison of schools and districts on assessment scores 
alone concerned school administrators. They carried their 
dissatisfaction to key legislators as well as State Board 
members. Under pressure froifi' the legislators the Department 
initiated a large campaign to assist local educators in the proper 
and full reporting of results. Advocated were early reporting, 
and reporting in the context of other information about 
education, i.e., the financial, staffing, and other conditions of 
education. The idea was to put the assessment scores in a larger 
context to provide for a fuller understanding and a better 
"evaluation" of the schools than a simple judgment made on one 
set of test scores (Donovan and Rumbaugh, p. 47). 

There is case after case of special interest groups, the press, and the 
public becoming concerned about evaluation results in education and, 
through various procedures, influencing the nature of those evaluation 
activities. Donovan and Rumbaugh provide another example concerning 
the Michigan assessment program. In response to local criticism that the 
norm-referenced tests were of little use in instructional improvement, 
objective-referenced tests were developed. But these tests resulted in 
large sets of scores for each school, and 

c 

. . . When the first reports were released, the press and state 
officials were confused by the many figures. They wanted to be 
. able to tell whether or not schools were doing better than last 
year, and which were "good" and which were "poor" achieving 
schools. There was a demand for a simple summary type report. 
The State Superintendent asked for a single score. (Donovan and 
Rumbaugh, p. 49). 

The SEA evaluation unit responded by providing a summary of the 
proportion of pupils showing mastery of more than 75 percent of the test 
objectives and three other such categories on each test, plus an historical 
report of the same data to show progress. 

Another influential group is the press. The press is frequently a 
catalytic agent in attempts of various groups to change evaluation policy. 

It is quite possible that all testing except that prescribed by the 
new [Standards of Quality] would have gone by the boards in 
1976 had not Virginians intermittent policy making body, the ' 
press, jumped into the fray. In both articles and editorials, 
newspapers, particularly those in Richmond, the State Capitol, 
argued that the elimination of norm-referenced tests (NRTs) 
would lead, eventually, to chaos. California was cited as a state 
which had changed tests so often that no one knew where the 
state was, what the anchor for scores was. The NRTs were kept 
- (Bracey, p. 23). 
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To some extent, of course, the SEA evaluation unit necessarily draws 
considerable public attention to its efforts. Such large-scale public 
programs as statewide testing bring considerable visibility to the 
evaluation unit, making it the focus of concern over local control in 
education. Evaluation data are often used in legislative . and judicial 
hearings for and against various positiop^. Further, as the evaluation unit 
successfully provides answers to some questions, the nature of evaluation 
itself leads to many new questions being raised, thus creating new 
expectations for the work of the unit. The attempts of various interest 
groups to shape evaluation practice attest to the importance of evaluation 
as a facet of state and local educational policy. 



DISCUSSION 

It is clear from the foregoing review that influences from the federal, 
state, and local level all work to constrain the nature of evaluation 
practice within state departments of education. Such influences regulate 
the organization ancf staffing uf such units, as well as determine what is 
evaluated, what methods are used, and who communicates with whom. 
These chapters portray SEA evaluation units as engaging in- many 
activities, "evaluation" often being operationally defined to include 
monitoring, impact assessment, information preparation, ' research 
synthesis, planning, and policy preparation, with the majority of attention 
and effort devoted to testing activities. Little attention seems to be paid 
to causal studies or the assessment of worth. Further, there is little 
explicit attention to the analysis of values or \^alue claims, although there 
is a great deal of attention to political analysis. These units are 
characterized by a lack of autonomy and independence. An SEA evaluation 
unit is one piece of a large organizational enterprise and, as such, staff 
members cannot play the role of "third-party contractors" that is often 
played by their university counterparts. Within these units, political and 
legal considerations are just as important as technical considerations, with 
most evaluation attention being focused on management assistance or 
policy analysis to the general exclusion of the improvement of instruction. 
As a general view, this characterization matches Law's description of SEA 
evaluation units in the chapter which follows: 

r They are embedded in a state agency and, hence, a[re 
bureaucratic by definition; they are in a political environment; 
they report to a variety of audiences; they operate under severe 
constraints of time and human and financial resources; they tend 
to be in a reactive mode. The units tend not to be innovative in 
their approach to evaluation since they operate under legislative 
and regulatory mandates. Departure from conventional custom 
and procedure is perilous and, thus, the mode of operation tends 
^ . to remain as it was a decade ago (Law, p. 182). 

A few SEA evaluation units are, of course, more proactive and innovative 
than others, but they tend to be the exceptions. 
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To be effective, evaluators within SEA evaluation units have to have 
the confidence of various parties, including SEA top management, SEA 
program staff, budget analysts, legislative staff, professional 
organizations, local school staff, members of the news media, and many 
others. The nature of the "agency role" influences the effectiveness of 
evaluation within the agency. If the evaluation staff do not understand the 
larger picture within which they fit, then the evaluation will not have 
maximum utility. One detects in these chapters a strong orientation to 
evaluation as a continuing function rather than attention to individual 
evaluation studies. In contrast, most of the evaluation literature treats a 
single evaluation study as if it were the beginning and ending point of the 
evaluation enterprise. 

When asked to describe the nature of their evaluation operations, all 
of these authors placed their current efforts within an historical context, 
describing how the current activities and organization came to be and the 
rationale for its current activities. WhUe these evaluation managers are 
concerned with providing evaluative information of high technical quality, * 
they appear to be mpre concerned with integrating the evaluation function 
within the larger enterprise of educational policy and practice. 
Consequently, they are often concerned with doing better political analyses 
and with finding more effective technical analysis procedures which 
incorporate appropriate political considerations. 

The enumeration of the various influences shaping evaluation policy, 
and all educational policy for that matter, weakens the view of evaluation 
as a research-like technical enterprise devoted exclusively to the provision 
of information for use by a single decision maker. Educational policy can 
be seen to flow from a more complex process influenced by various social, 
political, and legal considerations, as well as technical concerns. The role 
of the state evaluation unit is much more than that of a simple provider of 
technical information. The^formation of educational decisions and policy is 
much too complex to be represented under the "informed decision maker" 
view. These chapters portray evaluators in state agencies as assisting in a 
highly interactive process through which management decisions get made 
and policy is formed. While these units may perform discrete evaluation 
studies, it is not these activities but the larger educatio^ial enterprise 
which is of most concern to these evaluation managers. 

This characterization of evaluation suggests, therefore, that the skills 
needed for effective evaluation within state departments include much 
more thap technical skills of design, data collection, and analysis. The 
ability to communicate and persuade in a highly politicized environment is 
an essential skill. Astute budgetary and financial analysis, problem, 
definition, understanding of the state context (rathen than concern with 
nationally generalizable data), and the ability to know what can be 
affected and how within the state^ setting (political analysis) is needed for 
effective evaluation within state departments. The characterization of 
evaluation as persuasion (House, 1977) seems a much more ,accurate 
description of- evaluation in state departments than does evaluation as field 
research. 

I am not claiming that aU the factors discussed above influence all 
evaluations in state departments aU the time. But the evidence from these 
six chapters, and from related work with SEAs, suggest that these factors, 
in total, have a much greater influence on the nature of SEA evaluation 
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practice than the technical issues of evaluation methodology which are 
most often discussed in the evaluation literature. As I have already stated, 
there are wide variations across state departments in terms of the nature 
and focus of their operations, but they all share the common condition that 
political, economic, legal, and organizational influences are much more 
important in determining what they do than are technical considerations. 

It appears, therefore, that evaluation theory which is predicated on 
the view of evaluation as a narrow conceptual activity of informing 
decision makers and as a variety, of field research where technical issues 
are of prime consideration, is not likely to be of much utility in improving 
the practice of state department evaluations. This view is simply not 
compatible with the context of SEA evaluation which is portrayed in these 
six case reports. Evaluation within these SEAs is not social science field 
research nor assessment of worth, but testing, management, and policy 
formatipn conducted within a highly complex social setting. To improve 
evaluation methods and theory, perhaps more attention should be paid to 
understanding how evaluation functions within organizational and social 
contexts rather than to elaborating on concerns of causal modeling and 
experimental design. 
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FOOTNOTE 



^An earlier, slightly revised version of this chapter is to be 
published in Educational Evaluation and Policy Analysis , in press. 
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PART III. 

Prospects for the Future 



This concluding part of the volume takes a look at the future. Alex 
Law of the California State Department of Education discusses the need 
for new eipproaches in state level evaluations based on the nature of 
evaluation in these settings. He provides a short history of evaluation, 
citing the disillusionment with experimental methods, and discusses the 
lack of impaict of evaluation on decision making. 

Law urges attention to new approaches in three areas: methodology or 
design, the political or policy context, and actual use. He indicates that 
state-level evaluato^^s should provide both federally required information 
and the information needed by decision makers. With an understanding of 
and apprecmtion for the context and political ambience, an evaluator can 
educate and lead policy makers and program managers to a greater 
appreciation of the information an evaluation can provide. Law comments 
that state agency evaluations are certainly not immune from, and indeed 
may be more vulnerable to, the influences which inhibit the use of 
evaluations. Because most state-level evaluations are annual, summative, 
and quantitatively presented, they seldom are used. 

In the concluding chapter, Norman Stenzel of the Illinois Office of 
Education examines barriers to methodological innovation withia ce_search 
and evaluation units in state departments of education. Stenzel examines 
personal barriers to innovation, such as^ language differences, 
self-censorship, weak techniisal abilities, lack of vision, and professional 
isolation.* In ^addition to these personal barriers there are constraints on 
innovation related to being within a bureaucratic institution. Institutional 
barriers ' to innovation include preordinant staffing plans, narrow 
expectations of audiences, existence of turfs, insuffient time, and many 
other barriers.' . 
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In order to gain other views of the barriers to innovation in evaluation, 
Stenzel surveyed evaluation personnel in other state departments of 
education, soliciting criticism and comments on his findings. With this 
additional input, his chapter provides 'a summary of the problems of 
implem^tfng methodological innovations in research and evaluation units 
within state departments of education, ' ' 
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CHAPTER 9 

The Need for New Approaches in State 
Level Evaluations 

Alexander I. Law 



Program evaluation is now a subject of interest in virtually all fields 
encompassed by the social sciences and has, in the past decade, spawned 
not only .textbooks but new journals. Within education, most attention ha,s 
focused on a discussion of the procedures and nature of broad-scale 
evaluations, such as commissioned for by the Office of Education and the 
National Institute of Education. There ife also an increasing amount of 
literature dealing with methodology and statistical treatmerit. Although 
some attention, mc«tly in textbooks, is afforded to small-scale and local 
educational evaluation, virtually no discussion has been devoted to state 
educational agencies and their work in evaluation. 

The purposes of this chapter are to review where we are in educational 
evaluation in state agencies, to record some asides as to* how we got there,' 
to discuss present evaluation problems, and to offer suggestions for new 
approaches to evaluation by state agencies. Full recognition will be given 
to the pragmatic viewpoint that state agencies do, in fact, function in an 
ambience somewhat different from that of local e/ducational agencies and 
certainty quite different from that found in ihstitutions of higher education 
and contracting firms. \ 

Caulley and Smith (1978) conducted a siirvey of state education 
agencies to determine the activities of their evaluation units as well as the 
problems, constraints, and conditions under which thes^ units operate. 
They hoped that by identifying the characteristics and functions of the,^ 
activities carried out by the various state educational agency units, they 
could discern directions for the development of new methodologies to help 
' in the solution of problems indicated in the survey. The results of the 
survey showed, not surprisingly, that there is a wide variety in the 
functions, nature of the workload.j'and gize of the evaluation units. The 
problems indicated by respondents do not differ from the problems found in 
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the general evaluation community. These difficulties include methodol- 
ogies, reporting evaluation results, and the impact and . use of the 
evaluations themselves. 

A review of' the results of this survey provides not only a seiise of the 
variab^ility in the status of evaluations in state agencies but indicates that, 
although the majority seem to be adhering to the state of the art as it 
existed a decade ago, many of the agencies are growing and seeking ways 
to improve, their functioning. Again, not surprisingly, one- of the major 
problems was with the use of the evaluations by their various audiences. 
This problem exists throughout the evaluation community and is not unique 
to the state agencies. Interestingly enough, none of* the state agencies 
reported problems .-peculiar to their situation within a * political 
bureaudracy. One also gets the feeling from the survey results that, while 
evaluation activities .at the state level ar^ evolving, they have not yet 
reached the! level of sophistication found in other organizations which do 
evaluations, such as independent contracting firms. 

In reviewing the implications of interest in this particular survey, the 
authors have identified eight issues, two of which will be discussed^ here. 
One issue pertains to the use of rigorous methodology by many state 
agencies^usually based on a Campbell and Stanley quasi-experimental 
design— for their evaluations (Caulley and Smith, 1978, p. 28). States that 
use such methodologies encounter difficulties in meeting the r^equisites for 
these types of designs. The second issue concerns the lack of impact 
evaluation reports have on the various audiences 'addressed by state 
educational agencies (Caulley and Smith, 1978, p. 29). There seems to be a 
general feeling that evaluations have not had the- impact that is jjesired. 
This impression is troubling to state-level evaluatbrs. These two findings 
are not surprising. Indeed, as will be discussed later, the second may .well 
derive from the first. 

The reporting function is common to all SEA's but the scope and 
nature of other evaluation functions vary widely. While some units are 
large and deal with complex evaluations, others are^ little more than 
one-person units producing minimal required reports. Nevertheless, they 
share many, characteristics. They are embedded in a state agency and, 
hence, are bureaucratic by definition; they are in a political environment; 
they report to a variety of audiences;- they operate under severe 
constraints of time and human and financial resources; they tend to be in a 
reactive mode. The units tend not to be innovative'^in their approach to 
evaluation since they operate under legislative and regulatory mandates. 
Departure from conventional custom and procedure is perilous and, thus, 
the mode of operation tends to remain as it was a decade agoTX. 

It is worthwhile to take a short historical sidetnp andt^iew the 
origins of program evaluation in^ ordec to gain /some sensev^ 
peculiarities and why it exists in the state agencies. / - ^ ^-^"^^^ 

fflSTORY OF STATE EVALUA'TIONS 

Program evaluation emerged, most people agree, with the enactment 
of the major educational programs of the mid-sixties, notably the ESEA 
Title . and related categorical programs. No one was trained to be a 
program evaluator per se . Tqis was particularly true in state agencies and 
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public school settings. The early practitioners of evaluation in the public 
schools looked to institutions of nigher education and the guidamce they 
received came primarily from tHose whose discipline was experimental 
research. Indeed, the early evaluation reports— and to a large extent the 
evaluation reports today— relied on the rather conventional experimental 
model or variations thereof. The' guides to evaluation on which the early 
reports were based were the sai;ne criteria used in the laboratories of 
psychologists and other social scientists. 

Immediately preceding, the enactment of ESEA, Congress had enacted 
legislation creating a variety of social programs. Few, if any, of these 
early programs had built into them the constraints for evaluation thaw were 
built into subsequent programs. The continuing existence of a large 
number of these programs was considered by many as prima facie evidence 
of their worth and fostered the belief that evaluation was unnecessary. 
This view prevailed until Title I was enacted and was changed only at the 
penultimate moment by Senator Robert Kennedy^ (McLaughlin, 1975, p. 1). 

The decision by Congress to require evaluation was new, had to be 
implemented immediately, caught virtually everyone unaware, and, in 
retrospect, found the educational community ill-prepared to provide the 
types of information which Congress desired. 

After its inclusion in Title I, evaluation became valued by Congress 
and subsequently valued by most oversight agencies including state boards 
of education, state commissions and legislatures, local boards of education, 
and the public. It quickly became generally applied to a broad range of 
social programs and was in'stitutionalized in Washington, D.C. and 
elsewhere as a necessary component of such programs. Evaluation does 
have a worth in its own right and this worth persists despite recent 
controversies and the lack of consensus on the nature of evaluation within 
.Ibe profession itself. 



FEDERAL AND STATE EXPECTATIONS 

In the late 1960's, evaluation seemed to take two general directions. 
After the states submitted their original evaluation reports to the federal 
government, there was a massive disenchantment with the nature of the 
reports and their accuracy. Further, it was realized that there was no way 
by which the aggregated information could be sensibly communicated to 
Congress pursuant to the legislative direction. This situation led to a 
number of large-scale commissioned evaluations by jthe federal 
government. These commissioned evaluations, often* of extensive 
magnitude lasting over three to five years and costing many millions of 
doUars, still persist. They persist although their utility and the nature of 
their evaluation methodology, design, and findings are. fairly constantly 
called into question. 

Evaluation was also directed toward the provision of information to 
local and state governmental bodies. The accountability movement can 
quite directly be traced to the need of governmental agencies to gain 
infoi-mation abou|t the utility, effectiveness, and impact of programs 
provided with state categorical funding and the local dollar. 

Fro?n time to time there were, and stiU are, a number of other 
demands placed on the evaluation community relative to the accountability 
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movement. Included were the thrusts of PPBS, z^ro-baied budgeting, 
performance contracting, and a miscellany of similar efforts, many of 
them stemming direqtly from the influence of Robert Mc^Namara ~and" 
procedures implemented by the Department of Defense during his tenure. 
Th^se procedures were sufficiently appealing in trie operation of the 
Department of Defense that they yvere generalized v^ery quickly into the 
Deparjtment of Health, Education and Welfare and into state\and local 
agencies with, at best, equivocal results. They have nbw passed, a large 
extent, into disrepute. Nevertheless, each fad that came down \ the pike 
from the federal government was replicated by most states and frequently 
by local educational agencies. As a result, we ^cont^nue to see numerous 
movements lying along the same front. The motivation for\ these 
m^)vements lies in demands for information, and in the need for assurance 
by' the funding agencies that their dollars are being jused wisely anc| that 
th^ effects are what the proponents of the educational plan or prqgram 
envisioned. The promise that evaluation held out was not fulfped. 
Because the desired answers were not forthcoming. Congress and other 
agencies, particularly the Office of Education, became increasiijigjy 
restless. Instead of seeking alternative ways, they ^increased the press^ure 
to do more of the same. Accordingly, we had mor^ large-scale contracts 
and more upward aggregation attempts; all of them resulting in inadequate 
information. \ 

This unfulfilled promise led in turn to a crisis of confidence in tJ?e 
evaluation process itself, and it is this process that we need to examine. 
As mentioned, when the evaluation effort burst upon the ?cene, few wer^ 
prepared. There, was not a discipline which could be called program 
evaluation, yet there i was an expectation on the part of the polic^ 
community that these early evaluations would shed light on the quality an^ 
progress of the commissioned programs. Early evaluations did not do so^ 

and, for the most part, ihey do not do so today. Guba, (1972) stated: ^ 

i 

When the evidence produced by any scientific concept or ■ 
technique ^ continually fails to affirm experimental observation 1 
and theory arising from that observation, the technique may 
itself appropriately be called into question. It shall be the 
burden of my remarks that evaluation as we know it has failed, 
and that the world of evaluation does indeed require reshaping. 

Guba (p. 265) went on to say: 

The^ primary task in evaluation today is the provision of sensible 
alternatives to the evaluator. The evaluation of educational 
innovations awaits the moclernization of the theory and practice 
of the evaluative art. We need, then, a technology of evaluation 
(Guba, 1972, p. 265). ! 

Is there any hope that this modernization will occur soon? I 
believe that there is a great deal of reason to be hopeful. 

At the time Guba was writing the above, there was, indeed, movement 
in the field by several theoreticians, among them Alkin, Cronbach, Stake, 
Stufflebeam, Scriven and others. While the evaluation community was 
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waiting for the fruits of these labors, state-level evaluators plodded ahead 
using whatever was available and became increasingly frustrated. They 
were aware that few, If any, of the icons of experimentalism held real 
promise, that randomization was impossible, that there were no control 
g^'oups. Meanwhile the pundits argued interminably about appropriate 
statistical analysis. Campbell and Stanley became the byword but their 
procedures were not much better. State evaluators floundered on the 
shoals of experimentalism. The reports they generated tended to be 
sunimative, data oriented, and upwardly aggregated. These evaluations did 
not produce change and, indeed, had little use of any kind. They were, for 
the most part, ritualistic, annual regurgitations of aggregated test data. 



BEYOND THE HYPOTHESIS TESTING MODE 

As time passed, increasing reliance was placed on evaluations 
generally based on the Tylerian objective-attainment model. These 
evaluations, like their predecessors, have had little impact. However, 
these evaluation modes proved both frustrating and satisfying to state 
decision makers. This schizophrenic condition resulted because these 
evaluations might answer questions about how many students were served, 
whether services went to the appropriate students, and, to some extent, 
indicated the quality and impact of the program as it related to the 
objective-attainment model. 

Ultimately, the evaluation community gradually began to extricate 
itself from the purely methodological bog in which it had been mired since 
1985. New and sometimes radicai evaluation rrnethods were tried-. Various 
evaluation models were developed. Evaluators began to look fo other 
disciplines— public administration, political science, phUosophy, and the 
like. After a decade of false starts, eveiluation began to move rapidly 
ahead. - > 

State agency evaluators, unfortunately, have not moved with the same 
dispatch as the rest of the evaluation community. State and federal 
evaluations had become annual events displaying the time-honored pretest, 
posttest scores while hope was maintained that some elegant multivariable 
design would provide the long-sought statistically significant finding. The 
policy community continues to maintain 'the expectation that evaluation 
can answer the "go-no-go" questioiis. These questions are usually phrase^' 
Should the program be funded? Is Program A superior to Program . 
Occasionally a cost-benefit type of question or a question related to^ the 
cost-benefit model is raised. 

THE PRESENT CONTEXT OF POLITICS AND UTILIZATION 

A great deal has recently been written on the politics of evaluation 
and members of the evaluation community— pailieularly those in state 
agencies— are gradually acknowledging that they are involved in a political 
process. This realization causes discomfort in many evaluators who, 
heretofore, have considered themselves as solely reporters of objective 
findings, interpreters of reported events, and dealers in scientifically ^ 
derived truths. 



186 ALEXANDER I. LAW 



For the purposes of this discussion, politics is defined, not as the 
partisan politics with which we are all familiar, but as the process for 
creating change in a society. Politics in this context involves a distribution 
of, or competition for, stakes within a society or group. Stakes can be 
defined as rreaning money, ideas, prestige, influence, and jobs (cf. Sroufe, 
1977, p. 4). Through a lack of undenstanding of the political arena, 
evaluators avoided admitting the obvious to themselves: every evaluation 
report is essentially a political document. As Cohen (1970, p. 215) points 
out: 

One political dimension of evaluation is universal, for it involves 
the uses of information in changing power relationships; the 
other is peculiar only to those programs in which education is 
used to rearrange the body politic. Although one can never 
ignore the former dimension, its salience in any given situation is 
directly proportioned to the overt political stakes involved; they 
are small in curriculum reform in a suburban high school, 
somewhat larger in a statewide effort to consolidate schools, and 
very gr^eat in the case of national efforts to eliminate poverty. 
The power at stake in the first effort is small, , and its 
importance slight. In the social action programs, however, the 
political importance of information is r^aised to a high level by 
the broader political character of the programs themselves. 

This observation is particularly relevant at the state educational 
agency level since, save for the federal programs, of course, it is the level 
at which the stakes are highest. Carol Weiss (1972. p. 328) reiterates' this 
theme when she says: 

Evaluation has always had explicitly political overtones. It is 
designed to yield conclusions about the worth of programs, and, 
in so doing, is intended to effect the allocation of r^ources . . . 
This function, as handmaiden to policy, is probably the 
characteristic of evaluation research* that has attracted 
competent researchers despite all the discontents and disabilities 
of its practice. 

Many state evaluators remain uncomfortable with this concept and 
consider it foreign to their work even though it is explicit in the evaluation 
process. Sroufe (1977, p. 1) for example, says: 

My own view on this question is unambiguous: formal evaluation 
is an inherently political process and it has, in some instances, 
even greater policy consequences than 'do board or bond 
elections. Significant decisions regarding evaluation (i.e., what 
to evaluate, how, when and by whom), are made on the basis of 
the political values and resources of those— including the 
evaluators themselves— involved in any given system. a 

Perhaps it seems foreign because only recently has the interlocking of 
policy and evaluation appeared ih the educational literature on evaluation, 
though these concepts have been frequent themes in literature on public 
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administration. Policy does not just happen. It often goes through a 
tortuous process of evolution. Because this process is by definition a 
political one, state evaluators ore frequently naive when they enter the 
evaluation/policy arena. Their lack of understanding may be a basis for the 
plaintive statements made by state evaluators that evaluation reports are 
not being Used and, in fact,' seem to have little, if any, impact on the policy 
community. ^ ^ 

Lack of impact is certainly not unique to state agency evaluations; it. 
frustrates the entire evaluation community. Frustration is so prevalent 
t{^t a good deal of attention has been given to the subject in the past 
several years. Among the more recent books discussing this issue are thpse 
by Alkin, Daillak, and White (197a), and Patton (1978). 

Two themes seem to emerge from the analysis of lack of , impact or use 
of evaluation reports. They can be generalized as methodological questions 
and the "Two Comm^unities" explanation. 

Under the methodological heading, there are a host of criticisms about 
the instrumentation of evaluations, -their design, their analysis, and, 
frequently, the conclusions drawn from the analysis. Virtually every major 
evaluation commissioned has been attacked on one or more of these 
grounds. Audiences are confused by these attacks on issues about which 
they have little, if any, understanding. The debate rage^, external to the 
policy community, and the resiUt is not clarity— as it would be in an 
academic arena— but confusion— as it is in the policy arena. ^ The credibility 
of evaluations has Consequently suffered greatly. 

There is no single evaluation design, which can* go unchallenged. 
Certainly in the political arena there are a number of partisan positions for 
challenging interpretations and findings of the evaluation. When 
evaluation! continually fail to deliver unequivocal answers to what policy 
makers perceive as simple questions, the Rolicy makers- will turn elsewhere 
for information on which to base th^ir decisions. The notion that they 
could or should get multiple input as part of the political decision-making 
process is beside the point. What is of consequence is the perception by 
, the decision makers that evaUiatoPiS have failed to deliver on an implicit 
promise of^an answer. 

The second theme, the "Two Communities" concept, is intriguing. This 
•concept wasnntroduced by Caplan (1978) ai\d was drawn from an analysis of 
existing impact studies. Caplan (p. 50) says: 

The main factors which appear to limit the level of utilization 
can be found in that portion of the theoretical literature on • 
utUizatnn which may be categorized as Two Communities 
Theories. The essential line in this body of theory is that the 
'main reasons for the nonuse of knowledge can be understood by 
^examining the relationships of the reseapcher and the knowledge 
production* process to the policy maker and the policy-making 
process. More specifically, it suggests that social scientists and 
policy makers operate in separate worlds with different and 
often conflicting values, different reward systems, and different 
languages. Our data suggest that mutual mistrust is. an 
important factor in the separation of the producer and user 
communities. 
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Clearly, the idea that evaluations must communicate is a dominant 
theme in current thinking. The communication process is complex. 
Embedded within this process is the concept of the credibility of the 
evaluator and the evaluation. The mistrust mentioned above can be 
'mitigated by rapport between the communities. Conflict will probably 
remain because of the keen competition for stakes. Evaluations which 
jud^e the worth of an endeavor will inevitably gore someone^s ox. The 
move toward formative evaluations which argues for program improvement 
as opposed to purely quantitative summative evaluations may well be a 
bridge over the gulf between the two communities. 

CROSSCURHEIOSIAND CONSTRAINTS 

The evaluation discipline is changing with a rapidity that is awesome. 
As previously mentioned, evaluation-related journals and books are 
proliferating in education and in the social sciences generally. This 
activity is healthy, but often confusing. Gurus quarrel with each other. 
Students of evaluation struggle to stay abreast of the ever-changing or 
emerging ideas in political science, anthropology, sociology, economics, 
philosophy, etc. Old truths are replaced by new truths. 

Confounding this situation are the shifting sands in governance, 
philosophy, and political activity. Kirst (1979) has identified more than 
fifty sui^Jtantively new reform or categorical programs in California. 
Although California is probably the le^ider in this type of change, it is not 
unique among the states. , 

This ferment has implications for evaluation change in all state 
agencies. Specifically, there are four new phenomena which have broad 
implications for evaluation'^ in state agencies: fiscal conservatism, sunset 
laws, proficiency testing, and Title I. models, o 

Fiscal Conservatism . While Proposition 13 had its origin in California, 
the general principle espoused in 13 seems to be common across the 
country. This, general principle is one of fiscal and political conservatism. 
With conservatism comes a renewed demand for evaluation 'related to 
accountability. Questions, not unlike questions which arose in the middle 
1960^s, are being formulated. With severe constraints on funding, it is 
common for state legislators to ask: How much are we getting for our 
ncioney? Which program is superior to another program? What are the 
relative costs and benefits accruing from the implementation of *these 
various programs? 

Sunset Laws . Related to this fiscal conservatism is a move in the 
various states as well as in Congress to enact the so-called sunset law 
provisions. An e3q>licit evaluation of the particular programs under 
question is required, fhese evaluations are, like those in the middle 1960^s, 
asking the summ'ative questions— the impact questions: What have these 
programs accomplished? What is the worth of these programs? Again, in 
some instances, the question asked earjier is posed: Is Program A more 
effective and efficient than Prbgram B? ^ . 

Proficiency Testing . A third theme now sweeping the country is 
proficiency testing, specifically for high school graduation. As of this 
writing, thirty-eight states have either statutes in effect 'or about to take 
efrect on this particular issue. On first blush this issue would seem to be 
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one of measuFement, but state agencies need to be aware that withm the 
question of proficiency testing lies the additional question of the relative 
effectiveness of remedial strategics for those students who fail the test. 
Nearly every statute contains some provision and funding for programs 
. relative to those students who are unsuccessful in completing the test and 
such programs' should be evaluated. 

Title I Models' . The fourth theme nationwide is the issue of the Title I 
evaluation models. These models are based on. a norm-referenced test and 
an experimental model. This general methodology runs contrary to all the 
evidence we have regarding the relative effectiveness of evaluation 
designs. Such models would appear to be a ktep backwards if implemented 
as planned. Pretest, posttest models were shown to be ineffective in 
producing the requisite information for Congress in the first instance. 

Evaluators, as they become more sophisticated, as they move away 
from stereotyped designs toward mbre creative and influential evaluations, 
as they progress from the summative to the formative (including special 
limited focus studies), and as they move from the lock step of the 
norm-referenced tests toward more appropriate instruments for more 
diverse programs, are harassexj by the demands of the four movements 
described above. These movements are looking to the norm-referenced 
test, to a return to the summative type of evaluation, to comparisons of 
programs, and to the expectations held by policy makers for sure answers. 
It will take sensitivity and creativity on the part of the state educational 
agencies to maintain credible evaluations and to adapt within the 
constraints of these crosscurrents in order to avoid regressing to their 
previous, positions. 

By now my biases should be clear. Evaluation has come through a 
tumultuous decade and, while it is certainly not mature, it is at least a 
teenager. There are numerous schools^of thought, but one gets the clear 
impression that many of the leading "theoreticians, by changes, in their 
original— often divergent— courses, are becoming, if not congruent, at least 
close to parallel in their thinking. (This* observation is not to suggest 
stasis, only agreement on some principles; disagreement remains on 
others.) I believe there is an obvious need for new approaches in state 
level evaluation in order to improve state evaluation per se , and to provide * 
to constituent local educational agencies leadership by example ^d 
precept. In my opinion, state evaluators have multiple roles; teaching js 
one^of them. 

After I reviewed the literature, it seemed to me that state evaluation 
has been overlooked. Yet, state educational ^ agencies are ^ a unique 
community, intermediate between the federal government and the local 
agencies. They are purveyors of information to diverse audiences and 
function under a variety of constraints which largely have shaped their 
products. In the teacher role, state education agencies can educate those 
who commission evaluations. We should look at new approaches in three 
areas: methodology or design, the political or policy context, and actual 
use. 
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DESIGN 



The word, design, is used here in its broadest sense. The foregoing 
part of this chapter described the beginning of program evaluation and how 
that initial effort relied on people trained in other disciplines. Early 
evaluators based their designs on what they had been trained to do in^ 
experimental research-based studies. They were reinforced in this 
behavior by mentors in the universities. The decision makers fed the 
process by demanding "impact" evaluations. This set of experimental 
designs weakened when it became increasingly clear that little 
policy-relevant information we^ forthcoming. Guba characterized 
evaluation as a failure but saw hope in the works of several theoreticians. 

The door toward change was opened, characteristically, by Cronbach 
in his "Evaluation for Course Improvement" (Cronbach, 1964). Among the 
themes of this article were: ^The gpreatest service evaluation can perform 
is to identify aspects of the course where revision is advisable . . . The aim 
to compare one course with another should not dominate plans for 
evaluation." Cronbach further stated: 



Old habits of thought and long-establisHed techniques are poor 
guides to the evaluation required for course improvement. 
Traditionally, educational measurement has been chiefly 
concerned with producing fair and precise scores for comparing 
individuals. Educational experimentation has been concerned 
with comparing'score averages of competing courses. But cou^e 
evaluation calls for description of ^outcomes. This description 
should be made on the broadest possible scale, even at the 
sacrifice of superficial fairness and precision (1964, p, 247). 

At about ^he same time, Scriven (1967) and Stake (1967) argued for 
judgment and description as part of the evaluation process with Scriven 
coining the now familiar summative/formative distinction. 

These three examples of change are cited in order to highlight the 
awareness held early on by these f^oremost thinkers, that strict 
experinxental methods had deficits for decision making and* that a broader 
con^pt of evaluation was necessary. Now we are beginning to see 
emphasis on less structured, less formal approaches to evaluation. Terms 
like "ethnographic," "case-study," "N=l" are beginning to enter the 
evaluation vocabulary. The pendulum arc extends from Berstein and 
o Freeman (1975) who feel that for an ^valuation to be of quality it must be 
based on random sampling and have ^a quantitative data Shalysis using 
multivariate procedures within an experimental design, to Eisner (1975) 
who argues for "connoisseurship" and "educational criticism." While the 
trend is clear, no inference should be made that nonstatistically based 
studies are less rigorous. They are simply different. 

It is my concern that the pendulum will swing from one polar 
position— purely quantitative— to the other— purely qualitative. This 
extremism will do harm to the educational and policy-making community. 
It is essential to have an appropriate balance. Each extreme position has 
some merit, but not in isolation from the other and certainly not in blind 
application without a careful determination of the information needs of the 
client. 
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State evaluators will argue that often evaluation requirements are 
established by the federal or state legislature. This is true. The 
requirements are frequently i.* the realm of program outcomes, e.g., the 
Title I models. As long as this focus continues, information needed to 
enlighten managers about program "practices will be largely unavailable. 
Evaluators must be active in enlightening decision makers regarding what 
questions need to be asked. Evaluators must work collaboratively with 
program managers in the design as well as in the implementation of the 
evaluation. They can provide both the federally-required information and 
the information needed by decision makers. 

• A good design is not one of elegance, but one of function. On this note 
Cronbach says: 

The designer plans on the basis of some conception of what an , 
excellent evaluation is or does. Sonie writers seem to judge a ^ 
design in isolation, applying standards of form. 1 apply a 
standard of function;. I favor whatever design promises to 
increase the social benefit from the evaluation. Discussion of 
design alternatives rests,' then, on a view of how evaluations can 
influence social affairs (1978,^ p. 27). 



Earlier in this chapter I proposed that evaluation occurs in a political 
environment, and that evaluators are participants in the competition for 
stakes, often influencing the positions but never whoU]/ determining 
decisions. The new approach ! advocate is not one of open partisanship, but 
one of awareness of the role and influence evaluations can have in the 
political scene. 

If evaluations are to be used in decision making and program 
improvement, the evaluator has to be more than a mechanic tinkering with 
numbers which are proxies (and often poor proxies) for outcomes of 
programs. To evaluate a program effectively the evaluator needs to know 
the nature of the program, the reason it was established, and- the 
motivations of the policy makers. All legislation is a compromise and 
frequently ambiguously stated. Eyaluators can assist in collaboratively 
developing appropriate questions to be investigated based on a good 
under?. canding of the objectives of the program. This task is" not simple. 
Many program objectives are not explicit m ihe policy statement; there 
are usually several political motives at play, and questions are not common 
to^ all the players. Determining what to evaluate can be difficult. The 
evaluation of the Follow-Through Program is a classic example of 
misdirected evaluation— misdirected not only in the sense of answering 
(poorly) the wrong questions, but of decision makers pursuing faulty 
evaluation policy. 

The state evaluator is in an awkward position, as was the 
Follow-Through evaluator. Frequently the evaluator has the evaluation 
specified for him by either statute or administrative interpretation. He 
has limited freedom to vary. In addition to this constraint, policy makers 
often have exaggerated expectations of what evaluations can deliver. 
These 'conditions are largely the fault of the evaluator through passive 



CONTEXT 




IS? 
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acceptance of the fiats and by promising,. explicitly or implicitly, that his* 
product will provide more than it can, I am in no way advocating 
disobedience to statute or administrative regulation. I am advocating a 
productive role in which the evaluator can, with an understanding of the 
context of the evaluation, add enormously to the information transmitted 
and educate the policy makers through example. 

Most statutory language specifies demonstration of program outcomes 
through the use of pre-post testing in basic skills. Taken by itself this is 
poor legislative or regulatory language, and evaluators who slavishly 
comply only with the letter of the law are not doing all they can. They are 
attending to the summative question when they could be attending to a 
variety of questions which would influence and improve the program. Do 
not ask too many questions; ask the right ones and be sure you have a clear 
understanding of the objectives of the program. In collecting "impact" 
information (a euphemism for test scores) it is reasonable for the evaluator 
to thmk about what information he has or can get through a functional 
design adapted to these interrelated questions: 



LEGISLATIVE QUESTIONS 



MANAGEMENT CONSIDERATIONS 



• 1. Are you doing what we told 
you to do? 



2. How well are you doing it? 



Is the right population served? 

What are their characteristics? 

Are resources allocated properly? 

What problems exist in implemen- 
tation? 

Where are these problems? 

What are examples of success/ is 
failure? . 



3. 



What conditions exist now 
in program schools that 
are different f ronv before? 



What are factors in success/ 
failure? 

Is there need for modification in 
administration of program guide- 
lines? 

Are needs of unserved students 
identified? 

What are dissemination implica- 
tions? 



How are focused studies of problems 
identified? 

What are refinement arid mainten- 
ance implications? 

Is there differential impact? 
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This list is not meant to be an exemplar of questions. Some questions/ are 
policy relevant, some are meant to be purely formative and to assist 
managers in the conduct of the program. 

With an understanding and appreciation of the context and political 
ambience an evaluator can educate and lead policy makers and program 
managers to a greater appreciation of the information ah evaluation can 
provide. 



UTILIZATION 

The persistent, common, and woeful cry of the evaluator is that the 
evaluation has no impact on the community to which it, is addressed. As 
has been pointed out, there are a number of reasons for lack of impact: 
the wrong questions ' were asked; the methodology used was wrong; the 
.technical quality was poor; experts quarreled over findings and ^eroded any 
credibility the report might have deserved; and evaluators could not 
communi<?ate with, or understand, the policy community. Indeed a gloomy 
picture, but generally true, Cohen and Garet (1975, p, 19) note: 

In general, 'efforts to improve decision making by producing 
better knowledge appear to have had disappointing results. 
Program eva-luations are widely-reported to have little effect on 
school decisions; there is similar evidence from other areas of 
.. social policy. The recent national experiments in preschool and 
early childhood education (Head Start, Planned Variation ahd 
Project Follow Through) do not seem to have affected federal 
decisions about priorities Within such compensatory programs. 
There is little evidence to indicate that government .pldnning 
offices have succeeded in linking social research and decision 
making. 

State agency evaluations are certainly not immune from, and indeed 
may be more vulnerable to, the factors which inhibit use. As in 
methodological matters, state educational agencies lag behind the 
evaluation community in having their evaluations create impact and 
change. This situation stems from the fact- tha't 'most state-level 
evaluations are annual, summative, and quantitatively presented. In 
addition, writing tends to contain jargon and reports are fundafnentally 
uninteresting. Tables of F-ratios surrounded by turgid prose do not impress 
or inspire the policy, making or legislative aide. When state educiational 
agencies take new directions in design and when state evaluators become 
sensitive to the context of evaluations, some of the utilization barriers 
may be minimized, One'thing is certain: unless state evaluators take new 
directions, the credibility of their work will continue to be seriously eroded. 

I have discussed both the design and the context of evaluation. It is to 
be hoped that some of those concepts will improve the product. 
Understanding the motives of those who require reports will help in 
producing work which is credible to them. Credibility must be built; it 
does not exist by itself even for the prestigious firm , or, academic, let alone 
a state bureaucrat. Credibility, can be both specific to a report o^- genera), 
based on merits of previous work. Credibility is also changeal^le (i^lkin, 
Daillak, and White, 1979). 
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Credibility is easier to attaiin in formative evaluation where there is 
^exchange between evaluator and client on a reasonably frequent basis, than 
in a summative endeavor where such interchanges occur rarely, if at all. 
The cUent in a formative situation can obtain, fairly immediately, the 
information he or she may need and act on it. The client has an aid in 
making judgments about changes. While such is not the case with typical 
summative evaluations, it seems sensible to provide the client progress 
reports, explanations of the evaluator's procedures and, where appropriate, 
a preview of coming attractions. This is a variation on the theme of 
collaboration' in the evaluation process and on the Concept of the evaluator 
as a teacher or, in this case, a g^ide through the labyrinths^of a long-term 
effort. The communication process can also, help clarify the motivefs 
mentioned previously and help the evaluator make any nece^ary changes in 
the evaluation design. If, for example, the evaluator ' must deal with 
student achievement changes but determine.s, through the open 
communication process with a legislator, program manager, or whomever, 
that .there is an additional, new, concern about institutional changes, he 
can add this redirection to the evaluation plan. Communication is the 
essence of an effective evaluation and of the sensitivity of the evaluator to 
the needs of the client, ' ^ 

, While I have discussed new approaches in three areas, these three: 
"design, context, and use, are obviously parts of a whole. An evaluation 
carried out with an appropriate functional design, created with a 
knowledge of the context of the program, and presented in a manner which 
communicates has a high potential for impaet. Finally, evaluators must be 
patient. Their art is young, evolving^ ^and^maturing. Just as they have a 
role in educating their clients, evaluators must also learn from themselves 
and from their history. 
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CHAPTER i O 

Problems in the Implementation and 
Acceptance of New Evaluation 
Approaches in the State Departments of 
Education 

Norman Stenzel 



From the beginning, let me assert that innovation in a bureaucratic 
system such as a state education agency (SEA) is not- an impossibility. In 
fact, there may be some advantages for an SEA, such as the one with which 
I am familiar, 1 that other institutions or settings may not enjoy. The 
task of this chapter!, however, is to examine barriers to innovation. To 
serve that purpose, the greatest proportion of effort here will focus on 
barriers. Topics will include personal and institutional barriers. A third 
section of this chapter will report a modest attempt to validate the 
content of the chapter. A final but brief section will suggest a few 
positive ia£pects supporting the application of innovative evaluation 
approaches by bureaucratic institutions such as SEAs. 

Some limitations to the chapter should be noted: The types of barriers 
discussed are treated as a series and are not set into a coherent theoretical 
statement. Further, the barriers are not necessarily mutually exclusive. 
Although some of the points I am about to make are directly 
autobiographical in nature, other points reflect the activity or lack of 
activity of persons with whom I have frequent contact. Each point to be 
made, however, will be tied to a brief example when one seems to be 
pertinent. The examples should be considered both as grounding in reality 
and as an attempt to suggest experiences which may be similar to the 
experiences of others in other SEAs.^. 



By personal barriers to innovation I mean those conditions to which an 
individual is sensitive and which serve to limit the "scope of individual 
activity. 



\ 
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Acculturation " - 

Thinking about my own background and the innovations I attend to or 
ignore, I can easily see that my ignorance of new approaches is in areas 
most distant from my methodological preferences. Many of my own 
attempts to deal with new approaches to evaluation are incremental to my 
background, I cannot claim to have been involved in rrdical paradigm 
shifts. 

Coming from the University of Illinois and CIRCE (the Center for 
Instructional Resesirch and Curriculum Evaluation), I have been 
acculturated to Ihe acceptance of human judgment as an approach to 
establishing the value of the thing b^ing evaluated. There are necessary 
concommitants to evaluation based on ^ ju^mental 
perspectives— multiplicity of reality and the political nature of action. I 
read with avid interest the newest materials from Hastings, Hoke, House, 
and Stake. And kindred spirits are of interest too—Eisner, Hamilton, 
McDonald, and Lou Smith. My acculturation, then, has a strong qualitative 
cast. For others, acculturation "obviously could be quantitative in nature. 
The point is, howe\/er, that the changes and modifications in the practice 
of evaluation by individuals is often strongly related to their heritage. 

n 

Language 

One aspect of heritage is the languages spoken by an individual. 
Graduate students, researchers and evaluatprs gain .stature as they 
demonstrate facility in preferred metljodologies. Stature, in part, derives 
from the appropriate use of language (jargon). That language serves to 
separate the linguist from the non-linguist. Shibboleth is born! 

My .linguistic comfort level is challenged by words and symbols of 
areas not often frequented in my intellectual endeavors. I do not always 
have access to others who might be more comfortable with another 
language, and without a translator, I am not always certain about the 
meanings and use of new words or symbols. 

There is a line between familiarity and fluency which nee,ds to be 
breached if innovation is to be implemented. There are hazards related to 
language that the potential innovator would like to avoid. Some of the'se 
hazards might include misinterpretation of words similar to words in 
another language (cognates can be misleading), misinterpretation of ideas 
based on incomplete understanding of terms, and misapplication of 
terminology. I have avoided evaluation approaches burdened with new 
language requirements because, beyond my need to learn the language, I 
have to discuss thQ innovation with others. At that point I would have to 
be the translator for others. That would be a test of fluency. 

Self --Censorship 

Ultimately, a decision to suggest an innovative evaluation approach 
rests with the individual aware of what the innovation might be — pointing 
out a potential innovation involves risk-taking Dehavior. In a group setting, 
the counterforce to risk-taking behavior is self-censorship. 
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^Sell-censorship may be stimulated by reaction to fears; but it also may 
"be related to role.s and strategies. The setting may be judged not to be 
riglU^or a, possible innoyation because of such conditions as the potential 
amourft of opposition, or the amount of time available. The individual also 
may n^: want to be perceived as an initiator or idea person at that time* 
Or, th J chances for acceptance may not be best until other"* things have 
been d|ne. Such considerations are .based on the individual's estimate of 
the situation. 

Technical Ability 

In our- evaluation unit, with specific assignments for individual 
evaluators, the technical capabilities of the evaluator assigned to the task 
definitely constrain the type of evaluation to be undertaken. To be sure, 
technical assistance can be sought from others in the agency, but the 
extent of the assistance by others is often limited by the requirements of 
their .own responsibilities. Obtaining technical capabilities by contracting 
for persons outside of the office is possible where evaluation funds are 
available, but is limited where funds are limited. Indeed, funding is often 
limited. 

It is a dilemma for the SEA to be expected to have technical expertise 
and not be able to quickly respond when inhouse technical capabilities do 
not exist. Even with such relatively common technical skills as the 
mathematics of regression, as in the case of Title I evaluation, the federal 
government saw fit to provide technical assistance to the states and local 
education agencies (LEAs). 

New approaches to evaluation may include the use of new technical 
procedures which are not familiar to the evaluator. For example, the 
advent of adversarial evaluation in judicial formats stimulated discussion in 
our evaluation unit about the skills of lawyers in case building, 
interrogation, and argumentation. It was felt that such technical 
capabilities were only incidently a part afe the repertoire of our existing 
staff. 

In my own efforts to apply committee hearing formats to evaluative , 
purposes, special skills were found to be necessary. The procedural 
requirements of committee operations include functions and powers of. 
committee members and'chief counsels, and the rights of witnesses. These 
matters also include skills and vtools-not commonly practiced by evaluators, 
such as committee leadership, organizational- tactics, and counseling 
witnesses. In fact, in an early implementation of the committee hearing 
approach,, it was readily apparent that committee members required skills ^ 
in organizing and developing lines of questioning that were not immediately 
^available without practice. f 

jOther skills may be required by other new evaluation approaches. For 
example, the image of focusing evaluation attention as suggested through 
the analogy of watercolor. painting (Gephart, 1981) obviously suggests 
editing skills. But what about the proper framing of the focal points? Are 
additional skills necessary to select surrounding materials that lead the 
reader to the focal points without distorting reality? 

New skill requirements can be formidable barriers to implementing an 
innovative evaluation approach. Some evaluators may not be comfortable 
with the challenge and response nature of adversarial evaluation 



200 NORMAN STENZEL • » 

approaches. Other approaches may have other types of requirements which 
are not easily^ anticipated. 

Best Fit . 

Rational design processes include determining the best process to 
supply the needed information. With familiar evaluation formats, the 
determination of best fit is not often difficult to make. The matter of 
including new formats as approaches in the decision matrix is a slower 
process. New approaches suffer when a lack of familiarity clouds their^ 
utility. For example, in reading through many of the materials from the 
Research on Evaluation Program at the Northwest Regional Educational 
Laboratory, I cannot comparatively assess the utility of "phenomenological 
evaluation" or "geocode analysis," to such approaches as "discrepancy 
evaluation models" or 'Tiierarchical cluster analysis." 

Vision 

In working with the Illinois Gifted Program, it was found that 
demonstration centers did not easily stimulate adoption of innovative 
approaches to the education of the gifted student. At least part of the 
explwation identified by the study suggested that potential adopters did 
not sec how the innovation could be-^apted to fit a different setting— a 
lack of vision. In addition, in examining the innovative teacher, we found 
that the innovative teacher often accounted for innovations again and 
again over time. It was often the same teacher who was involved in an 
innovation now and in another ifinovation two or three years ago. , ^ 

I used this frame of reference as an analog to review our evaluation 
efforts over the past nine years. Indeed, it does seem to be the case that 
even in our evaluation work there are those who just could not envision how 
an innovation might be applied to the circumstances of our, work. In 
addition to our own staff, however, it is perhaps more frequently the 
program staff we were to serve who lacked vision enough to allow 
allocation of resources -for innovative efforts. 

As for the second point in the Gifted Program analog, it i? more 
difficult to provide such an estimate, given the staffing patterns of our 
unit. The fact of a largely transient staff, coupled with resistance to 
innovation from program personnel, preclude finding a matching pattern. 
An informal impression is that those staff wlio were innovators did seem to 
be on the lookout for new ideas which fit needs. 

Proximate Time 

Exposure to ^new^dea can be fruitful only if there is an opportunity 
to apply that idea. I would call that a fortunate juxtaposition of events. In 
one instance when I >was an observer at one of the first attempts to 
implement a judicial rRodel evaluation, I was impressed with the potential 
of the modeL Soon there^^ter, while my interest was fresh, an opportunity 
arose whicji allowed me to implement an adversarial evaluation. 

- The proximity of the stimulus event to an opportunity to apply the 
idea apparently is important. At least it is parallel to what we found in the 
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Illinois Gifted Program. New ideas need to be implemented within a short 
time of exposure, or the ideas will be set aside and forgotten. 

Unfortunately, the optimum time may have passed. If planning for an 
evaluation has gotten under way, the likelihood of changes of an innovative 
nature decrease. I have found in evaluation workshops conducted by our 
unit, that participants who have already made initial plans are not likely to 
change those plans.. If planning had not gotten under way, change was 
much more likely, < 

Isolation 

The dissemination of innovation in ripples from one location to another 
is a pleasant image. Unfortunately, the average SEA functionary is likely 
to feel quite isolated. If there are ripples, it seems that they have 
dissipated before they brush the shores of the SEA. 

Isolation is due to a variety of factors. Out-of-state travel budgets to 
attend professional meetings are extremely constricted. In our office, 
attendance at the annual meeting of the American Educational Research 
Association (AERA) is limited to one person ^per department. The 
department in which I operate includes planning, research, and 
evaluationrrall of which have a wide scope of concerns." One person in 
attendance at meetings of such breadth 'as AERA cannot serve all of the 
interests of the department. Some of us consequently are frantic postcard 
senders „ ("Please send me a copy of your presentation."). But papers are 
slim inspiration for implementation and provide fewer hints for adaptation 
than would a few face-to-face questions and responses at the end of a 
presentation. . 

I have not found a collegial network for exchanging papers and ideas in 
order to obtain reactions among SEA personnel. To be fair, many of the 
papers produced by SEA personnel may be or seem to be idiosyncratic. I 
must also note that some links do exist between states through the AERA 
Special Interest Group for"" State Office Researchers, through the 
Committee for Evaluation and Information Systems of the Council of Chief 
State School Officers, through advisory groups, and through former 
professors. 

Within Illinois, a few of the staff in our research department are 
participants in a state-based group called the Research Advisory Council 
(ReACr composedvof representatives of the educational community^ both 
local educational agencies and universities. ReAC is convened twice a 
year to react to SEA research and evaluation activities" and reports in a 
one-day forum. 

Graduates of universities in Illinois often do have some links with 
former professors. But disj^ance and time take their toll on such linkages 
and collegial contacts with universities is limited. The benefit of contacts 
and communication does not. permeate much beyond the immediate 
participants. The number Of projects undertaken without benefit of 
broadly based perspectives is much larger than those where contacts occur. 
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' INSTITUTIONAL BARRIERS 

In addition to these personal barriers to innovation, there are 
constraints on innovation related to the. individual within a context. In my 
case the context is a bureaucratic institution. In this section of my 
"confessions," I will attempt to enumerate and elaborate upon factors 
related to"tfie dynamics of the institution which appear to be counter-force 
to innovation. At times the institutional features identified here are the 
result of interpersonal interactions involving features outlined" in the 
previous section. There are additional features^- however, involving 
interactions with persons other than evaluator's. Another feature relates to 
the strategic calculus dominant in the institutional hierarchy. These are 
the types of features I am referring to as institutional barriers. 

Staffing Plans 

General strategies prevalent in an institution set much of the tone 
which either favors or discourages innovation. Methods of staffing and 
staffing patterns, for example, are powerful shaping devices. During the 
past decade, at least two staffing strategies b^ave been used in the 
institution with which I am familiar. One approach specifies the 
institutional system and defines the requirements of that system. The next 
step in that process is to develop job descriptions required to implement 
the system. The conceptualization of the system is the 'most creative work 
to be had in the institution. After that, the majority of work is directed 
toward ' the maintenance and implementation of the system. The 
expectation obviously is that a new employee will become a part of a 
previously defined system and perform a set of activities which derive 
from the definition of the system.. The individual selected for employment 
under such circumstances need only represent a reasonable fit to th(2 
^requirements of the system and continue to perform thos.e functions to ari 
extent which does not excessively disrupt the system. A second approach 
delimiting institutional employment was to identify the general principles 
upon which the system would operate and then pick the candidates who 
would be compatible with such principles. 

A job description in the former case would include something like: 
"Conduct and implement Title I, 89-10 evaluation utilizing data based on 
Normal Curve Equivalents and federally prescribed evaluation models." 
Another job description might suggest that the candidate should be familiar 
with both qualitative and quantitative approaches to data gathering and 
analysis for^evaluation purposes. In the latter format, the job description 
would prescribe timely evaluations tailored to meet the needs of decision 
makers. In both cases, the emphasis is on filling a role as specified. 

In a small unit of less than 20 persons within an office of over 900, 
evaluation staffing presents a variety of problems. For programmatic 
functions, hiring evaluators without programmatic expertise is a barrier to 
providing service. We find that evaluators with disciplinary backgrounds, 
however, have a limited perspective on the potential scope of evaluation. 

Openings for evaluators in an SEA ,may occur at almost any time 
during the year. Attempting to hire staff at times which do not match the 
academic year does 'not always yield candidates who have a broad 
evaluative perspective. In one of our recent attempts to hire an evaluator 
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for special education activities, for example, persons interviewed for the 
position had psychology, some special education discipline, or governmental 
careers 'backgrounds. Both the psychologists and the special education 
personnel viewed evaluation in a case diagnostic sense. Those candidates 
with the governmental career background at best perceived evaluation as a 
management supervisory task. 

Staff Utilization 

The lack of flexibility in the use of a portion of the office staff proves 
to be a limiting feature in undertaking, evaluation efforts. In our office, we 
rovide evaluation services to other sections based on a "contract" between 
our unit and the section involved. Such contracts contain timelines for . 
completion, and thereby preclude alternative uses of staff allocated to the 
tasks to be completed. Further, in the case of federal funding sources, 
evaluation personnel are confined to work on matters relating to that 
funding source. As a result, the possibility of shifting staff from one task 
to'^another, utilizing the talents and skills of one individual in an area to 
which that individual is not initially assigned becomes nearly impossible. 
Under such constraints, new methodologies which take more time than can 
be allocated will have to wait until other opportunities for their 
Implementation. 

Politics 

Internal politics, as the games of power played as part of office 
dynamics, have an influence on what happens and does not happen in 
evaluations conducted by the SEA. The regime in power may set a 
conservative, moderate or progressive tone. The two most recent 
administrations might be characterized as talking progressively whQe being 
moderate in practice. Any innovations, then, will be modified by the 
"tone" of the regime. For example, recently tfie department 
administration reserved veto power over the final phase of a proposed 
study in order to preserve the office autonomy in decision making— a 
rejection of the "evaluator as surrogate decision maker" role. 

External politics is also important. Taken into account often are such 
factors as the image of the office^ good will, or the influence of external 
:* groups. One. example might come from evaluations involving other state 
agencies. The problems of status, prestige, and power are of great 
importance in such a setting. Diplomacy is a necessary part of the 
evaluation process. A second setting involves evaluations of activities 'of 
programs in the largest school systems in the state. If the evaluative 
findings are not altogether flattering, pressures on the office are likely to 
emerge from other sources. Innovations, then^ may be rejected because of 
the external politics involved in settings where the innovation might be 
applied. 

Audience 

Evaluation as an act of communication has to consider its audience. 
Although I am fascinated by the analogy of watercolor painting to 
evaluation presented by Bill Gephart (1981), I know at least a half-dozen 
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** administrators who would look askance at an evaluation proposal utilizing 
the language of a painter to describe a plan for an evaluation. 

The audience for an evaluation, then, is a significant constraint on the 
. nature of an evaluation and evaluation report. Many of the educational 
administrators we deal with have educational research reports as their 
ideal frame of reference. Certainly, reports with familiar formats are 
reassuring. Reports with easy- to- find summary sections seem to please 
harried administrators—at one time in our SEA the prescription was for a 
one-page memo to summarize anything important. Although this is no 
longer true, the idea of a short, report as opposed to a long report is still in 
• favor, • 

Innovative reporting techniques are not foreign to our agency. One 
effort, for example, used a brief movie as part of the report, A movie can 
be a high-impact reporting device for an audience willing to stay put long 
enough to get to the part with the hero riding off into th§ sunset. Some 
reports have multiple ' audiences— office , administrators, parents,, 
legislators, and even students. Reporting, then, often has to be practical 
for multiple audiences. The movie in this example could not accommodate 
communication to multiple audiences with ease. It was bound by access to 
the film and film showing technology. 



The matter of standards is crucial in evaluations. Standards are the . 
specification of what is considered to be justification for a statement 
about the value of a thing. The acceptability of the justification to the 
audience for the evaluatiop and to ^ those who have an interest in the 
outcomes of the evaluatioji needs to be a part of the considerations 
undertaken by an SEA in reviewing innovative evaluation formats. 

The decision to base the value of a thing on statistical significance is 
not universally acceptable as the justification of worth. One current 
alternative to that approach is to allow that the esteem in which a t^ng is 
held is an indicator of value. Evaluations of this sort are also not 
universally acceptable as the justification of worth, Justificatfon for some 
is not justification for all. The value attributeld to a thing is not universal. 
Acceptable justification for claiming that value exists varies from group to 
group. 

The results of evaluations conducted with these different approaches 
to standards can be ciiametrically opposed to each other if taken to be a 
general statement that a thing is valuable. In my work with Follow 
Through parents, I found many to be strong advocates of programs that 
statistically did not demonstrate the success that other programs had. Yet 
parents met with state 'office' bureaucrats, sent letters to Washington, D,C, 
bureaucrats advocating the preservation of the programs provided for their 
children, and picketed local bureaucrats in an attempt to mobilize their 
assistance in that task. The sense of value held by the parents was founded 
on observing the enthusiasm of their children and a belief that what was- 
being done especially for their children was better than what would exist 
without Follow Through, The quantitative approaches used by the federal 
contractors were not convincing to the parents, , 
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Turf 

It may be that innovative evaluative approaches will run into 
difficulties stemming from defined or understood matters of turf. In our 
case, for example, direct access to members of the State Board of 
.Education is defined as off limits for office staff in gen^eral. The staff who 
are to provide ^liaison with the State Board members guard their knowledge 
of formats for presentations against intrusion by others. Any innovative 
evaluation approarh which would require access to State Board Members 
would have to avoid the appearance of infringing upon the turf of another 
section. 

Accfountability * 

The era of accountability has had an impact on efxpectations for 
evaluation. The advocates of accountability will not be impressed with 
new approaches to evaluation unless they contribute to establishing the 
accountability of the thing being evaluated. * . 

Accountability to the SEA as a funding source often emphasizes 
congruence to proposals submitted, reviewed, and accepted. Congruence 
methodologies for evaluation are well known, and current praictices are 
sufficiently systematized to allow SEA project evaluators to be 
comfortable with the task and the results. ° 

Aspects of accountability not easily met through existing methodology 
are concerns of benefit and efficiency. In contrast to- discrepancy 
•approaches, benefit and efficiency approaches are more complex and more 
difficult to implement. For the SEA, attempts to provide technical 
assistance in evaluation to projects often include congruence approaches. 
Training others in the assessment of benefit and efficiency has been less 
successful. Consequently, these more difficult tasks are not as frequently 
undertaken. 

,The utilization of new evaluation methodologies which are pot 
complex have a better chaAce to be implemented than those appearing'^o 
be more complex. In addition, methodologies not seen to be compatible 
with the needs of evaluating for accountability are currently ignored. New 
methodologies not serving such needs will also be ignored. 

Planning 

A cufrent 'motif in our institution is long-range planning. What should 
be done at some time in the future? Immediately in response to the notion 
of planning, an old saw comes to mind: Planning is an excuse for not doing 
anything. Plan, replan, and'plan some r^tre and you never have to get to 
an implementation stage. 

Planning is often nothing more than linear projection. The past serves 
to constrain the possibility thinking of staff in the present. In one case, a 
study by external consultants suggested that the office shoul<l improve its 
capabilities^ for evaluative planning. The response of the office came in 
terms of increasing the staff in an already existing unit— dedicated to 
conducting internal audits. 

In planning, habits and preferences of the existing staff are constraints 
which are often implemented. ^ Individuals ^may at times resist or ignore 



20$ NORMAN STENZ^L * V ^ " 

chknge because of the comfort or security of the ways it always had ^een 
done, or because the identity one has achieved in doing a task in the past. 
These individuals resist rnnovation through planning. When planning is 
thrust upon them, they maneuver so that only linear, incremental, and 
small changes take place. • 

Fitting innovation into such an environment can take advantage of the* 
planning requirement. In spite of linear thinking and incrementalism, 
4nnovation is possible. A colleague of mine indicated that he introduced 
ideas and encouraged the program personnel to think that the ideas were 
theirs. Innovations which can be instituted over a period."of timi? and which' 
are compatible with former systems have an opportunity to be adopted. 

Time Available 

The Application of innovative evaluation approaches to SEA 
enterprises is burdened by the constraints \^f time. Evaluations in our 
office often are not leisurely undertakings. .Infrequently seems as though 
the information was needed yesterday. *Even in projects where ample time 
is initially anticipated, it often turns out to be' the case that decisions need 
to be made sooner than had been projected. 

Good tjme estimates therefore are needed. Initially, the time 
estimates are necessary so that the feasibility of an evaluation approach 
can be estimated. In our case, at least an additional month is added to 
account for our data control and review procedures. Secondarily, time 
estimates may also be necessary during- the course of events. In many 
cases data are called for prior to the completion of a study. The timing of 
the ava.ilability of data is a necessary featur.e of the replanning and 
reallocating of resources needed to accomplish the task. 

Institutionalized Methodology 

In our current operation, because of limited staff and high demand, 
efficiency is important.' One way to increase efficiency is to use similar 
approaches "to evaluation year after year in prograiT) after program. The 
approaches used become systematized and routinized. This allows' advance 
planning and provides recognizable forms for data collection from clients 
and those being subjected to the evaluation. ^ 

The evaluator^s management task is eased considerably in such a 
setting. Planning becomes a simple updating and tinkering with plans from 
the previous year. Negotiation with clients to implement evaluations 
counts on the familiarity of the clients with the existing processes. Little 
explanation is needed as jargon and terminology become more and more 
familiar to those involved. Data gathering is facilitated as evaluees learn 
what to expect, and organize their practices to serve such expectations. 
They do this by creating record systems and data gathering processes of 
their own to serve the evaluation on an annual* basis. The efforts of 
evaluees may even be supported by the SEA directly through training or 
indirectly through praise. ' 

In this setting, new evaluation models will have to overcome 
structures and superstructures in places which serve old evaluation 
models. In our^SEA, evaluations serving such tradition do not take changes 
lightly. There is a debt of obligation to consider. 
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Justification 

Activities imdertaken in the SEA require justification both at the unit 
level and at the office level. Defending a new evaluation approach to 
peers and colleagues does not often present a problem. Yet in the office 
hierarchy/'lhere are systems like. pur data coordinating council, an internal 
control mechanism designated tc?>Jimit the data burden for srchool personnel* 
for any data, collection process undertaken by the SEA. If . evaluators 
determine that a new evaluation process is desirable, the lead time 
necessary to introduce any incorporated data collection process to the data 
control system omits quick response and could even prphibj^ 
implementation if 'the data burden is considered to be excessive compared 
^ to other data already justified. In effect, data collection becomes almost a 
firsjt-^come-first-served process. p " 

Internal Advocacy ^ ' 

Another feature of innovation discerned through the studies of the 
Illinois Gifted Program is that where new^ programs got started, it was 
" often due' to the efforts of an internal advocate. The- internal advocate 
"was someone who hot only was a spokesperson for gifted education, but was 
also a salesperson. 

A sales pitch is often necessary.^ In evaluation, the application of 
innovative approaches certainly will not take place without the internal 
advocacy of a person with persuasive abilHy unless it is imposed. upon the 
agency by an external authority. The necessity of sales pitches aimed at 
administrators, program personnel, audiences for evaluative findings, and 
other evaluators are activities not easily accommodated in evaluation 
training. Discovering the right sales pitch for the right target is an art 
practiced by safes personnel on an everyday basis, but .infrequently by 
evaluators. It may be th^t assistance to evaluators in sales techniques will 
be a con com mi tan t of innovation. 

Status Quo 

I suppose that there are those in every institution or bureaucracy who 
are dedicated to .preserving things the way they are. They are often the 
no-sayers, the guardians of the gates. It is also a major function of 
operations manuals to codify the world according -to SOP (Standard 
Operating Procedure). , . 

In the evaluation unit in our office, the operations manual has the 
function of prescribing the form of agreement Setweert the evaluators and 
other sections in the office. It may be that standard procedures will be a 
constraint on the introduction of innovative evaluation into a setting. For 
example, the form of our agreement may not accommodate a responsive 
approach to evaluation. Although it is far from conclusive proof that the 
agreement deters responsive evaluation, indeed very little of our work 
could be described as being very responsive in Stakeian terms. 

In addition to the comfort of doing things in familiar ways, 
functionaries in institutions often operate under the dictum, "Don't do. 
anything that wUl rock the boat." This does not outright prohibit 
innovation-, but it doe^ preclude innovations whi9h are attention getting. 
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Evaluations, then, often have to be low' prof ile in nature., They cannot 
cause evaluees to have expectations for anything different. Although the 
^valuation may stimulate change, arousing expectations and thereby 
pressure, is often a restriction on the type of evaluation conducted. 

One step beyond the low profile expectation is^ the position thaf 
evaluation is not to add to the existing hassle.* As one harried' 
administrator put it, "Don^t tell me about anything that weVe not already 
doing something about." In such settings, 'innovation in evaluation will have 
to provide information without rocking the boat. 

Data Imperialism . ' . ^ ^ 

The p^rogatives of authoftty as represen^tejd in the SEA includes being 
able to ask for and obtain almost any inform^jon from those subjected to 
their authority. School districts and varieties of educational projects are 
those subjects. They are subjected to ^anticipated data burdens and often 
additionally requested information which had not been anticipated. It 
ampunts to a kind of imperialism where the burden not only requires 
products but also appropriates the labor of the entity subjected to the 
i^berialism. ^ 

\one consequence of data impewalism is the low prioritization- local 
distrujts place on many of the'requestsy Lack of local records, lack ot care 
in generating records, lack of local utility of the data generated,, ahd often 
lack Joi . understanding of processes \all jeopardize the quality of 
imperialistic studies. The initial implementation of the infamous Title I 
evaluation models suffered from the consequences of data imperialism. 

Innovative evaluation techniques for SEAs will not be of much use if 
they contribute to the imperialist tradition, or if they do not -assist in the 
remediation of the imperialist roie of the SEA. 

Data Control . • 

Our institutional response to those oppressed under the yoke of data 
imperialism is data control. Reviewing all of the data collected by all 
office sources, attempting to reduce redundancy, and regulating all new 
attempts to gather data became' the responsibility of a data chief, his 
staff, and a supporting ^representative 'committee. Innovations in 
evaluation requiring large data 'collection efforts are of limited 
acceptability. 

Under data ^corttrol 'systems, the information gathered, ideally, should 
be added to a data pool to be tapped for other purposes if the.neecl arises. 
Data gathered for evaluation, too, should be considered for such ja pool. 
Data gathered in forms similar to but not compajtible with extant data 
because of new evaluation system will not be favorably received. 

Using extant data in evaluativ.e models will be favorably received in, 
the data control processes of the SEA. Our office recently used teacher 
service records to examine the nature of LEA superintendent careef 
patterns. New evaluation models devoted to secondary analysis, or analysis 
of census materials v/ill be useful. ^ 

At the, state level, utilizing^ data gathered emd reported by LEAs is 
typical. Data based management requires^ample notification of the need 
for the data— often at least a year in advance for local units to be able to 
comply with requests for data. y] ^ 



IMPLEMENTING NEW APPROACHES 209 



Finance 

Financial factors influence the feasiblity of innovation. Budgeting 
features and cost deliberations are major impedimerits. 

Budgets are established over a year in 'advance in most institutions 
such as ours. At that distance, only the need for a few major evaluations 
are apparent, and only general parameters of the evaluation ''can be 
conceptualized. The budget line item for evaluations, then, is constructed 
out of^this general level data, and reflect costs most familiar to the budget 
preparer. 

More proximate to the actual evaluation is the preparation of the 
detailed budget for an 'evaluation. In a tightly budgeted setting, such as 
the one with which I am familiar, the need for accurate cost estimates is 
important. In such a case, it is far easier providing estimates based on 
previous experience than providing cost estimates for procedures and 
processes which are only dimly perceived. New evaluation forms s\|ffer 
from the lack of familiarity with costs. \ 

Allocation of resources is added to the financial consideration. Our 
office operates under a ceiling on staffing. A major question then 
becomes, "Can we do it without additional staff?" If additional staff are >, 
needed, the issue becomes, "What is the extent of that requirement? Can 
temporary staff be hired? Can we get edong if we hire ^temporary staff for 
only a short period of time?" 

There is yet another type of fiscal consideration— the assurance of ^ 
benefit from the funds invested. Tried and familiar approaches are known 
to work ^t least to some degree and- have known outcomes. , In an 
evaluative bag of tricks, the expectations of a client can be matched to a 
familiar approach* The approach can then be promoted and reasonable 
assurance given that the payoff will be worth the time, effort and expense. 

Limited Spontaneity 

'\ With advanced planning, accountability, coordination with other 
groups, tight timelines, and limited funds, the opport\inity to pursue 
unanticipated outcomes, alternative processes, or sudden ideas is put 
asidei Advanced planning locks personnel into predetermined patterns of 
acceptable behavior. Accountability to do what was specified in 
agreement with clients does not aUow spontaneous adjustments to include, 
take advantage of, or initiate new evaluative approaches. Agreement from 
clients to initiate an alternative would require educative and negotiative 
activity too- involved for quick response. 

Request for Proposal Processes 

Requests for proposals (RFPs) are familiar documents to an SEA 
bureaucrat. The evaluator in the SEA may be involved in responding to 
proposals, writing proposals in response to RFPs from other sources, 
reviewing proposals written by other groups in the office, and developing 
RFPs which allocate money for evaluation efforts. These are the. 
_variaUoFis-^-the-theme incorporated 1ii;the^ 
are often a constraint on innovation. ^ 

In responding to proposals from other sources, SEA writers are likely 
to play it safe when evaluation components are required. An evaluation 
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schema should not detracc from the main purpose of the proposal, and 
should meet the expectations of the readers who will rate che proposal. 
Proposal writers want a winning formula. Recently, a series of discrepancy 
evaluation model workshops were held throughout the United States, ^art 
of the "hype" for the workshops indicated that a large number of proposals 
funded by federal sources incorporated a discrepancy evaluation model. 
Eureka, the winning formula! Thereafter, a large number of responses to 
federal RFPs from our office contained the winning fopintila. 

If not the discrepancy evaluation model, some other traditional 
approach which ^it was^ anticipated would be readily recognizable to 
proposal readers was included. Tradition is another attempt to include a 
sure thing. 

When RFPs are issued by the SEA calling for proposals from bidders, 
the RFP generally is constructed according to the stock format suggested 
in our operations materials. In RFPs it is not wise to deviate too far from 
the stock format for a variety of reasons. First, prior to being issued, the 
RFP will have to be reviewed according to office procedures. Deviation 
from the stock format would require justification and time-consuming 
explanation. Why would anyone risk delay? Why would anyone want to 
spend time and effort to educate a whole list of potential signators for a 
sign-off sfieet? There are other things to do. Second, using prespecified 
components keeps work to a minimum. The stock evaluation component 
phrasing includes "measurable objective" and "specific timelines." The 
wording is reflective of Mager^s objectives, and was selected because it 
appears to assure accountability or, at least, to suggest the appearance of 
audit-like processes. In adopting that terminology and conceptualization, 
responses to proposals by bidders are cued to What, for all intents and 
purposes, is a quantified or timetable oriented evaluation. 

When proposals are returned for review by the SEA, the prespecified 
criteria matching the RFP tend to discourage approval of innovative 
approaches. Standards for judgment are normative. 

A MODEST APPROACH TO VAUDATION 

The standard critique of personalistic accounts such as this is that one 
person's view of a setting is just that— one person's view. Other frames of 
reference could well be necessary in order to arrive at an understanding of, 
a situation or at an approximation of reality. 

e A second criticism is that personalistic accounts are often situation 
specific and lack generalizability. Even though there are SEAs in other 
states, there is no guarantee that conditions in Illinois are similar to 
conditions elsewhere. 

Both of these criticisms are touched upon in this section. In respect to 
the first criticism, collegial feedback was sought and obtained. In many 
writing efforts, the nature of such feedback is obscured in the 
undocumented revision of a paper. In subsequent paragraphs, something of 
the feedback for this document will be reported. In" respect to the second 
criticism^—a— modest- attempt -to~-gather- survey- data- from^ evaluatio 

personnel in other SEAs was implemented. (For a description of 
methodology, see Footnote 2.) A summary of those results is also included 
below. 
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Other perspectives from within the Illinois agency were interesting in 
themselves. Five responses were obtained and are summarized here. 

One review of an earlier draft of this chapter by an administrative 
level person assumed the task of defending the setting. However, even at 
that, the defense only singled out 5 of the 31 items as being disparate from' 
his administrative frame of reference. ^ 

Gatekeeper oriented descriptions in the paper were questioned first of 
all: "A year ago, I was quite concerned about the gatekeeper 
problem... [but direct contact with top administrators] negated the 
gatekeeper role." This theme applied to comments on the control 
mechanisms present in this office, such as our data coordinating council. 
The reviewer responded to the ideal of the necessity of a data pool as 
sponsored by the data coordinating council in this manner, "True, but not 
impossible. The system forces the evaluator to be specific and not fish too 
much." At another spot, when speaking of .justification as a constraint on 
innovation, the administrative critic commented, "I doubt that he [an 
administrator] would turn down an innovation if benefit can be shown." 
(Aye, there's the rub. At another point in the chapter, I have mentioned 
that lack of available information about potential benefits is alsa a barrier 

.^P_jyn[np yaiioD.') - [ 

Another critic provided counter examples to several of the 
institutional features pointed out to,, be "turf" problems. The critic 
provided one illustration from the area of migrant education: "We were 
able "to convince (staff) that other methods than test scores were better." 
When this paper mentioned our system of contracting as a potentially 
negative feature, the critic provided a denial: ". . . The agreement 
(system) has not kept [the evaluator] from providing service in the area of 
LEA Services." . . 

Another of the refviewers provided a comment which adds dimension to 
the nature of personal limitations. In examining the distinction between 
personal and institutional barriers, this critic indicated that there may be 
self-imposed constraints. If an individual v/ants to get around the system, 
there generally is a way. 

The other reviewers provided little additional criticism, although all 
reviewers had been urged to point out elements of the chapter which they 
did not believe to ring true. Several, however, did suggest constraints 
which they believed not to be included in the paper. Two general positive 
review statements were among the comments obtained-— "Good readable 
paper." and, "F/jscinating paper!" All in all the criticisms and challenges 
only covered a few of the aspects of the paper. Rather than refutation, 
the comments which were obtained may point out that much of what may 
appear to be impediments are idiosyncratic. Interactions between one 
individual and other participants in the bureaucracy may be different at 
different times, and it may be different if others were to broach the same 
idea. (Is the "stock" of the individual on the "rise" or is it "falling"?) There 
are examples where "rules" are broken and where money is found when 
none existed before. The investigation of such dynamics are beyond the 
scope, of this chapter. 

A second set of intensive reviews were conducted. Four persons not 
employed in the same bureaucracy were invited to make comme/its. Two 
were program auditors with the state government in Illinois, and two were 
associated with evaluation in other SEAs, 
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The program auditors both positively received the paper. From their 
comments, it appears that some of the points struck a positive chord with 
their experience. Points particularly noted as being pertinent included: 
Best fit problems— knowing what a new format can do; isolation— especially 
the "slim inspiration" of papers; time constraints— it appears as though 
program auditors face a piece-work review of their productiveness; the 
critical nature of standards— program auditors apparently . have difficulty 
with judgmental data used in some evaluation approaches; and 
institutionalized methodology— program auditors are involved in^an area 
where routinization of techniques and "no surprises" in regard to 
expectations are perceived to be important- - 

The one point both program auditors made a strong response to dealt 
with accepting human judgment as an approach to establishing the value of 
the thing being evaluated. One respondent felt it would take some time to 
explore the differences between us: "We'll have to talk about this over a 
tall glass of wine ..." 

The two reviewers from other SEAs found few items to^oppose Some 
points, they felt, were not particularly representative of their setting, but 
they could believe that they might exist elsewhere- One reviewer, for 
example, did not feel that program administrators were much of a 
coristraint upon evaluatibns.~rnile"ed, authorization for evaluations in that 
state at times came from the legislature- "The program staff have no 
choice— they don*t hold the purse strings-" Both reviewers felt that their 
staffing did not appear to be as greatly constrained as that in Illinois- 
Numbers, although limited, were not sparse; and applicants were generally 
adequately qualified for the positions. Two personal barriers did not 
appear to them to be of particular importance: "language" barriers, and 
technical ability- ^ 

In general, the two SEA reviewers positively perceived other aspects 
of the listing provided in this chapter- One reviewer commented, "It listed 
some things I would not say unless I were leaving my job-" When pressed 
about the things this might be, the reply was, "The importance of office 
image, and the influence of external groups- Our office has to maintain 
the appearance of independence from such influences— a staff member 
implying we were subject to such pressures would not be appreciated-" 

In a broader attempt to gather information about the content^bf the 
chapter, a survey of SEA evaluators brought a response from 17 out of 27 
persons polled- Responses indicated that eight aspects of the paper are 
possibly generalizable, that there was little agreement about five points, 
and that four points are not generally perceived to be tjarriers to 
innovation- ^ 

The points about which there was agreement included both personal 
-and institutional impediments- The majority, however, were institutional 
in nature - 

There wets one personal barrier of importance according to the survey 
respondents. This was the acculturation of the individual- Translating this 
into a statement about the conditions necessary for innovation suggests 
that a broader scope for the trainir^ of individuals appears to be an 
important feature in promoting innovative efforts in SEAs- 

Many of the institutional features which were generally perceived as 
barriers to innovation are not surprising- Staffing strategies and staffing 
patterns for evaluation groups are included- The availability of stafrto 
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implement evaluations was also considered to be a constraint on innovation 
in spite of what both of the SEA persons interviewed maintained to be the 
case in their evaluation shops outside of Illinois. They may be thg 
anomaly. Limited time, . limited cooperation from other agencies and 
groups, uiid limited funding were also perceived to be institutional barriers 
to innovation. In addition, institutionalized methodology appears to be a 
common feature of SEA evaluation approaches, the need to use approaches 
which can be implemented Without considerable "up-front time," and 
utilizing' an approach which people are familiar with are advantages of the 
institutionalized evaluation approach. Acceptance of those points, 
however, inhibits' innovation. Finally, SEA personnel apparently agreed 
that innovations occur less frequently where there is no internal advocate 
for evaluation. At least it would appear to be likely to a majority of the 
SEAs responding to the survey. 

Those points upon which agreement was not great suggest that the 
items may be barriers in some cases and not in others. The points 
included: the difficulty of assessing the benefit of an innovation, linear 
thinking in planning, the lack of juxtaposition between awareness of an 
innovation and an opportunity to implement it, the languages spoken and 
'understood by the evalualDr, and the tone of office leadership. AU of these 
were rated as moderate in significance. 

The final set of j^tems are those which respondents generally ag.aed 
were not significant as barriers to innovation. There were four points in 
this group: the lack of technical expertise, the lack of a match between an 
innovation and the purpose for evaluation, the lack of understanding of the 
innovation, and incrementalism in change. 



The SEA can be a positive force for innovation in evaluation. This can 
include both innovation in the practices of SEAs and LEAs. 

Authority 

The' SEA may serve as the authority, the enforcement agent for 
programs requiring evaluation. In such a setting, the SEA could be a 
positive force for innovation. It could require innovation to take place. 

The SEA in Illinois is currently involved in developing a process by 
which special education service delivery units will be required to examine 
their activities related to Public Law 94-142. The process, will include the 
incorporation of stake holders in the examination of the program 
components. The concern for stake holders is one of the current innovative 
efforts in evaluation. 

The ability to undertake, such an effort requires allocation of funds, 
personnel, and time. The whole process is an elaborate orchestration of 
factors critical ta the ultimate success of the undertaking, ^^he effort 
requires development and dissemination efforts. The SEA not only has the 
task of insuring that a new process is conceptualized and systematized, but 
also of obtaining appropriate implementation through broadly conceived 
dissemination Itasks. Dissemination requires political efforts to set the 
stage -for acceptance of the innovation. Next, technical assistance will 
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have to be provided*— not only to establish knowledge about the innovation, 
but also to insure that attitudes and personal conditions will be supportive 
of the innovation, 

M eta-Evaluation 

Another aspect of authority inherent in the roles played by the SEA vis 
a vis evaluation is the power of review. Evaluations both in and outside of 
the office may be subject to review for a variety of reasons. Many times 
the reviews conducted will not be intended to result in a change in 
practices. Their purpose commonly is to provide an estimate of the 
strength of the results of an evaluation for decision makers in the office. 

At times, however, the purpose is deliberately directed at the 
improvement of current practices, in a recent case, our unit was invited to 
review an accreditation styled process. Using a variety of tactics, we were 
able to examine a number of aspects regarding the processes used. Action 
taken on the results of our meta-evaluation did not result in the 
implementation of a new evaluation process, which was a possibility, but 
did provide incremental changes in the examined system. It is quite 
possible that in other cases more radical evolution could take place. 

Integrity ^ \ 

In my work with the bureaucrats of Illinois, I do not find a 
characteristic of slovenly intellectual activity, as the negative connotation 
of the term ^TDureaucracy" would suggest. Rather, I do find many 
conscientious functionaries attempting to do the best they can Within the 
system. They are people of great integrity. When better approaches to 
accomplishing the tasks they are responsible for are identified, they will 
become champions. They will work for proven innovation, but prudently 
are cWary of newness for newness' sake— especially when present processes 
function, even if slightly imperfectly. 

a 

A FINAL THOUGHT ABOUT BARRIERS TO INNOVATION 

The varieties of barriers identified in the first portion of this chapter 
may outnumber the few items used in defense of the SEA in the section 
above, but the optimist in me says that the barriers, are not 
insurmountable. Awareness of the barriers and the forms they may take 
will enable the bureaucrat interested in innovation to develop strategies 
which .,will insure progress where progress is necessary. Innovation in 
evaluation is possible. 
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FOOTNOTES 

lOne of the major divisions of the Illinois State Board of Education 
is devoted to planning, research and evaluation. In that division I am 
assigned to the Program Evaluation and Assessment Unit. I have conducted 
evaluations in such areas as gifted educa^on, special education, Title I, 
89-313, migrant education, and minimum competency testing. 

^ Methodological Note; Personalistic accounts, . autobiographical or 
anthropological, are oTlen suspect. Among other challenges, 
autobiographies are suspect as being ex post facto rationalization, whQe 
anthropological accounts are criticized as reflecting a limited perspective 
on a complex social order. Those challenges could apply here. That is why 
an attempt to provide some validation of the content of the chapter was 
undertaken. All challenges cannot be met, but even in "scientific" , 
enterprises that is true. la this case, two challenges were explored: "Is 
what is reported here in- or out-of-tune with other perceptions of the same 
SEA settir^?" and "Is what is reported here in or out of tune with 
perceptions of other settings (especially other SEAs)?" 

In the former case— the same setting— five written responses, 

— elaborated-by-f^ce'-to-f^GeMnteFview-^probes,--were»obtained^fro^^ 

in evaluation from the same SEA. The five respondents were selected as 
representing evaluation personnel with longevity of over a year in the 
SEA. Two additional critics were obtained from two other SEAs, and two 
persons with program audit experience in another agency in Illinois were 
also interviewed. All of these persons were invited to provide written ^ 
feedback on the paper— "Identify what agreement or disagreement you have 
with the points listed in the paper." Follow-up interviews by this author 
sought elaboration of the responses. 

A second approach was used to examine if the chapter would reflect 
the experience in other SEAs in any way. A list of representatives of SEAs 
of the Committee on Evaluation and Information Systems (CEIS) of the 
National CouncQ of Chief State School Offices was obtained and reviewed. 
Persons listed there working for SEAs and with job titles which were likely 
to reflect evaluation as a responsibility were sent a copy of the paper and a 
questionnaire. They were asked to read the paper on the basis for 
responding to the questionnaire which lasted 31 items ^ a barrier to 
^ innovation in evaluation. ^The reaction of 27 persons was sought. ^ 
Seventeen (63%) replied. A summary of those responses has been provided , 
in the chapter. 
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